Skip to content

[Idea] Angular Sum / Autocorrelation to discover scale in addition to phase/offset for optimal weight quantization #585

@leegao

Description

@leegao

I was taking a peek at compute_angular_endpoints_1plane, it seems like the current algorithm:

  1. For a valid weight range/grid size $Q$ (e.g. a step size 1/8 would yield a grid size/# steps of 9)
  2. Calculate the optimal phase / shift for this particular grid using the compute_angular_offsets algorithm (essentially the phase of the DFT of the weights at the frequency Q hz)
  3. Perform a shifted transform of the weights $\hat w = w + \mathsf{offset}$
  4. Solve for $P_0', P_1'$ with the new shifted weights

I believe you can do a little bit better by incorporating more degrees of freedom in your search space. In particular, by taking into account the observation that the weights are invariant not just under shifting, but also scaling.

E.g. for any affine function $f(w) = aw + b$:

$$ \mathsf{lerp}(P_0, P_1, w_i) = \mathsf{lerp}(P_0', P_1', aw_i + b) $$

s.t. $P_0' = \mathsf{lerp}(P_0, P_1, -\frac{b}{a})$ and $P_1' = \mathsf{lerp}(P_0, P_1, \frac{1-b}{a})$, constrained by $0 \le P_0', P_1', f(w) \le 1$.

The intuition here is that it's the fundamental ratio of the weights (e.g. the ratio between $w_{i+1} : w_i$) that must not be changed, but scaling and shifting can be easily reversed within the color endpoints to preserve the same output color.

For example, to quantize the weights {0.3, 0.4, 0.6, 0.7} on the the $Q=9$ grid of 0, 1/8, ..., 8/8:

  1. It can undergo the scale+shift of $f(w) = \frac{w - 0.1}{0.8}$ to yield {0.3 -> 0.25, 0.4 -> 0.375, 0.6 -> 0.625, 0.7 -> 0.75} which can be fitted perfectly on the grid.
  2. On the other hand, if we're limited to just shifts, then the best offset is $-\frac{1}{16}$, which yields {0.3 -> 0.3625, 0.4 -> 0.4625, 0.6 -> 0.6625, 0.7 -> 0.7625} which has a 50% lower error than just quantizing {0.3, 0.4, 0.6, 0.7}, but not a perfect fit.

The question is how to calculate the scale factor.


The shift/offset $b$ is already calculated today by the compute_angular_offsets algorithm (the $\mathsf{atan2}(S_{Im}, S_{Re})$ term from the fourier transform of the weights at frequency $2 \pi Q \times w_i$).

The scaling factor itself is a function of the fundamental frequency of the weights, which you can calculate from the autocorrelation of the DFT of the weights. For example, the fundamental frequency of {0.3, 0.4, 0.6, 0.7} is any multiple of 10, since the step size of $0.1$ will perfectly fit the data.

Since our DFT is defined as

$$ F(k) = \sum_i e^{i2\pi k w_i}, $$

the fundamental frequency $k_{peak}$ of this function will also be the size of the quantization grid with the best coherence with $w_i$.

Now, if we transform $w_i \rightarrow \frac{k_{peak}}{Q-1} w_i$, then by the fourier scaling theorem, the peak frequency will scale by $k_{peak} \rightarrow \frac{Q-1}{k_{peak}} k_{peak} = Q - 1$. In other words, scaling the $w_i$ by a factor of $\frac{k_{peak}}{Q-1}$ will move the fundamental frequency/periodicity of the transformed weights to perfectly match that of our quantization grid's step-size.


This then gives us a way to calculate the best scale+shift transform to minimize the quantization error:

  1. Fix a quantization grid $Q$
  2. Perform autocorrelation on the weights to find its fundamental frequencies/periods $k_{peak} / \lambda_{peak}$, and sort them in descending order
  3. For each $k_{peak}$:
  4. Calculate the scale factor $a = \frac{k_{peak}}{Q-1}$
  5. Calculate the shift/offset of the scaled weights $b = \mathsf{atan2}(\cdots)$ using the compute_angular_offsets on the scaled weights
  6. Calculate the new $\hat w_i = a w_i + b$, $P_0' = \mathsf{lerp}(P_0, P_1, -\frac{b}{a})$, and $P_1' = \mathsf{lerp}(P_0, P_1, \frac{1-b}{a})$
  7. Do a bounds check on $0 \le \hat w_i, P_0', P_1' \le 1$, if out of bound, go to the next best fundamental frequency $f_{peak}'$, if exhausted all of them, use the scale factor $a = 1$
  8. [Optional] Solve for $P_0', P_1'$ linear algebraically minimizing against the ground truth (instead of the stationary condition $\mathsf{lerp}(P_0, P_1, w_i) = \mathsf{lerp}(P_0', P_1', a w_i + b)$)

This works best when the true fundamental frequency of the weights are close to / smaller than the current quantization frequency being checked against, otherwise the bounds check will likely cause a rejection.

The autocorrelation could be done directly, or cheaply using a sweep (on a small integer frequency grid) + a few newton's iterations to get the top K peak frequencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions