convert : handle pre-quantized models #14810
Open
+174
−65
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Should fix #14762, and also address #3353.
This roughly implements the idea in #14762 (comment) to allow converting from pre-quantized models, by splitting
ModelBase.get_tensors
into an intermediateModelBase.index_tensors
which returns adict[str, Callable[[], Tensor]]
, which can be modified beforeget_tensors
is called.get_tensors
still keeps the same signature (it still returns anIterator[tuple[str, Tensor]]
).For now, support for these pre-quantizations has been implemented:
bitnet
fp8
gptq
(only 2, 4 and 8 bits, not 3)The 3-bit variant of GPTQ is more complicated, and so was omitted for now.
Notes
ModelBase.tensor_names
in favor ofself.model_tensors
which also allows getting the tensor data (because it's adict[str, Callable[[], Tensor]]
.lm_head.weight
tensor fromself.tensor_names
, but I don't see why it's necessary. I've removed it becauseself.tensor_names
was also removed.--dry-run
of a--remote
conversion of https://huggingface.co/WisdomShell/CodeShell-7B, which does run the tensor index completeness check. No problem either.TODO
--remote
)Make sure to read the contributing guidelines before submitting a PR