imatrix : use GGUF by default #14842
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The GGUF format for imatrix (added in #9400) is a saner default. The old
imatrix.dat
format doesn't store the per-expert evaluation counts for MoE models, which would make future improvements like #9400 (comment) less accurate.Previously, the behavior was to only use GGUF when the output filename ended with
.gguf
. That is too strict in some cases (e.g. when using an additional~
suffix to mark temporary files), and can also lead to people using the legacy format accidentally.Since the GGUF-based imatrix format is very close to the internal state of
llama-imatrix
, converting toimatrix.dat
format fromimatrix.gguf
is the same as directly generating theimatrix.dat
file (but the reverse is not necessarily true (e.g. for MoE models), due to evaluation counts shape not present inimatrix.dat
).llama-quantize
already doesn't use the imatrix filename to guess its type; it attempts to load as GGUF and fallbacks to the other format when it fails, so the name of the imatrix file doesn't technically matter.The new default imatrix output format is GGUF regardless of the output filename. The legacy
imatrix.dat
format can be produced with--output-format dat
.Make sure to read the contributing guidelines before submitting a PR