add svdquant int4 quantization support based on QuantizedLayout #11049
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for SVDQuant INT4 quantization as a new
QuantizedLayoutin ComfyUI, enabling faster inference and reduced VRAM usage.Key Changes
1. New Quantization Layouts (
comfy/quant_ops.py)Added two new layout types to support SVDQuant and AWQ quantization formats:
SVDQuantLayout: Implements W4A4 quantization using SVD decompositionX*W = X * proj_up * proj_down + quantize(X) * quantize(R)int4andnvfp4precision modesAWQQuantLayout: Implements W4A16 quantization (Activation-aware Weight Quantization)awq_gemv_w4a16_cudafor efficient GEMM operations2. Support Nunchaku-style QKV Merging (
comfy/ldm/qwen_image/model.py)In
Nunchaku's SVDQuantquantization forQWenmodel they merged k_proj, q_proj and v_proj into one tensorqkv_proj. So we have to modify QWen model a little bit to support that.Why we can't keep the q, k, v separated to keep it compatible as our previous code and model format:
3. Extended Quantization Parameter Support (
comfy/ops.py)mixed_precision_opsto handle additional quantization parameters:custom_layer_params_keysto support more complicated quantization format for SVDQuant-specific tensors (wscales, smooth_factor, proj_down, proj_up, etc.)QuantizedTensorincast_bias_weightfor dtype detection4. Model Converter (
comfy/svdquant_converter.py)Added converter to transform nunchaku-style checkpoints to ComfyUI's
QuantizedLayoutformat.Performance Comparison
Tested with this workflow: svdq_qwen_test_workflow.json
Tested on NVIDIA RTX 4090 (24GB VRAM) with Qwen-Image model:
qwen_image_fp8_e4m3fn.safetensorsqwen_image_int4_svdq.safetensorsImprovements:
Generated image:
By qwen_image_fp8_e4m3fn:

By qwen_image_int4_svdq:
Pre-converted Model
A converted model is available on Hugging Face:
🤗 lyogavin/QWen-Image_ComfyUI_SVDQ
Testing
Dependencies
nunchaku library installation instruction here.