Skip to content

Commit 97d4c87

Browse files
committed
change image size
1 parent d9378a4 commit 97d4c87

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/documentation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -648,19 +648,19 @@ I introduced a new hyperparameter that was previously hardcoded: ```quantscale``
648648
The plot below shows how the parameter influences the distribution of the first layer weights for the ```4bitsym``` encoding.
649649
650650
<div align="center">
651-
<img src="4bit_histograms.png" width="80%">
651+
<img src="4bit_histograms.png" width="60%">
652652
</div>
653653
654654
We can see that the weights follow roughly a normal distribution with some extreme outliers. Changing quantscale to a higher value with make the distribution wider and increase the fraction of outliers at the maxima. QAT makes sure that the errors introducing from clipping the outliers are distributed to other weights.
655655
656656
<div align="center">
657-
<img src="quantscale_scan.png" width="80%">
657+
<img src="quantscale_scan.png" width="70%">
658658
</div>
659659
660660
I performed a scan of the parameter for the ```4bitsym``` and ```4bit``` encoding. We see that too high (0.5) and too low (0.125) degrade the weight distribution, leading to an increase of loss and worse test and train accuracy. Within the range of 0.2 to 0.4, the performance seems to be relatively stable. However, there is still a strong random variation of accuracy, caused by different initializations of the weights. This is also owed to the marginal capacity of the model which was minimized as much as possible.
661661
662662
<div align="center">
663-
<img src="quantscale_entropy.png" width="80%">
663+
<img src="quantscale_entropy.png" width="70%">
664664
</div>
665665
666666
There is a rather interesting relationship when looking at standard deviation and [information entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) accross the layers. As expected, ```quantscale``` biases the standard deviation in a roughly proportional way. However, we can also see that the entropy increases for higher values. For low settings, this is because most weights are around zero and are truncated. Increasing the scale parameter also increases entropy. However, the accuracy of the model does not benefit, which means that only noise is added and no useful information.

0 commit comments

Comments
 (0)