Skip to content

convert : handle pre-quantized models #14810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

compilade
Copy link
Collaborator

Should fix #14762, and also address #3353.

This roughly implements the idea in #14762 (comment) to allow converting from pre-quantized models, by splitting ModelBase.get_tensors into an intermediate ModelBase.index_tensors which returns a dict[str, Callable[[], Tensor]], which can be modified before get_tensors is called. get_tensors still keeps the same signature (it still returns an Iterator[tuple[str, Tensor]]).

For now, support for these pre-quantizations has been implemented:

The 3-bit variant of GPTQ is more complicated, and so was omitted for now.

Notes

TODO

  • Test if this causes memory usage regressions
    • Lazy or not, safetensors or not
    • So far it seems good.
  • Test remote conversion (with --remote)

Make sure to read the contributing guidelines before submitting a PR

@compilade compilade added enhancement New feature or request python python script changes labels Jul 22, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

In case you feel like it, add support for MXFP4 as well. I will be upstreaming a ggml implementation soon and it would be nice to have HF conversion support. You can use some of the smaller models here https://huggingface.co/models?sort=created&search=mxfp4 (anyone without hadamard matrices should work).

@CISC
Copy link
Collaborator

CISC commented Jul 22, 2025

In case you were wondering, the workaround was about this line:
https://huggingface.co/WisdomShell/CodeShell-7B-Chat/blob/main/pytorch_model.bin.index.json#L555

Conversion would fail because that tensor doesn't exist.

Edit: I might have used the safetensors version, could be it actually works with pytorch bins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Direct FP8 conversion from convert_hf_to_gguf.py
3 participants