add svdquant int4 quantization support based on QuantizedLayout #11049

lyogavin · 2025-12-02T16:05:28Z

Summary

This PR adds support for SVDQuant INT4 quantization as a new QuantizedLayout in ComfyUI, enabling faster inference and reduced VRAM usage.

Key Changes

1. New Quantization Layouts (`comfy/quant_ops.py`)

Added two new layout types to support SVDQuant and AWQ quantization formats:

SVDQuantLayout: Implements W4A4 quantization using SVD decomposition
- Decomposes linear operations as: X*W = X * proj_up * proj_down + quantize(X) * quantize(R)
- Supports both int4 and nvfp4 precision modes
- Integrates with nunchaku CUDA kernels for optimized inference
AWQQuantLayout: Implements W4A16 quantization (Activation-aware Weight Quantization)
- Keeps activations in 16-bit precision while quantizing weights to 4-bit
- Uses nunchaku's awq_gemv_w4a16_cuda for efficient GEMM operations

2. Support Nunchaku-style QKV Merging (`comfy/ldm/qwen_image/model.py`)

In Nunchaku's SVDQuant quantization for QWen model they merged k_proj, q_proj and v_proj into one tensor qkv_proj. So we have to modify QWen model a little bit to support that.

Why we can't keep the q, k, v separated to keep it compatible as our previous code and model format:

1. nunchaku SVDQuant needs data driven calibration, they prepared datasets to calibrate to generate this quantization model. And all the intermediate scales, smooth factors etc are mixed together, can't be simply splitted into q, k, v.
1. there might be performance gain to merge q, k, v.

3. Extended Quantization Parameter Support (`comfy/ops.py`)

Extended mixed_precision_ops to handle additional quantization parameters:
- added custom_layer_params_keys to support more complicated quantization format for SVDQuant-specific tensors (wscales, smooth_factor, proj_down, proj_up, etc.)
- Proper handling of QuantizedTensor in cast_bias_weight for dtype detection

4. Model Converter (`comfy/svdquant_converter.py`)

Added converter to transform nunchaku-style checkpoints to ComfyUI's QuantizedLayout format.

Performance Comparison

Tested with this workflow: svdq_qwen_test_workflow.json

Tested on NVIDIA RTX 4090 (24GB VRAM) with Qwen-Image model:

Model	Runtime (1st / 2nd+)	Max VRAM
`qwen_image_fp8_e4m3fn.safetensors`	75.68s / 56.52s	21,333 MB
`qwen_image_int4_svdq.safetensors`	42s / 31s	13,205 MB

Improvements:

⚡ ~45% faster inference (31s vs 56.52s)
💾 ~38% less VRAM (13.2GB vs 21.3GB)

Generated image:

By qwen_image_fp8_e4m3fn:

By qwen_image_int4_svdq:

Pre-converted Model

A converted model is available on Hugging Face:

🤗 lyogavin/QWen-Image_ComfyUI_SVDQ

Testing

Unit tests included for converter functions
Tested with ComfyUI workflow to verify end-to-end functionality

Dependencies

nunchaku library installation instruction here.

…e merged qkv

add svdquant int4 support, modify qwen model to support nunchaku styl…

c8794e1

…e merged qkv

lyogavin requested a review from guill as a code owner December 2, 2025 16:05

Kosinkadink added the Core Core team dependency label Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add svdquant int4 quantization support based on QuantizedLayout #11049

add svdquant int4 quantization support based on QuantizedLayout #11049

Uh oh!

lyogavin commented Dec 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add svdquant int4 quantization support based on QuantizedLayout #11049

Are you sure you want to change the base?

add svdquant int4 quantization support based on QuantizedLayout #11049

Uh oh!

Conversation

lyogavin commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

1. New Quantization Layouts (comfy/quant_ops.py)

2. Support Nunchaku-style QKV Merging (comfy/ldm/qwen_image/model.py)

3. Extended Quantization Parameter Support (comfy/ops.py)

4. Model Converter (comfy/svdquant_converter.py)

Performance Comparison

Pre-converted Model

Testing

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lyogavin commented Dec 2, 2025 •

edited

Loading

1. New Quantization Layouts (`comfy/quant_ops.py`)

2. Support Nunchaku-style QKV Merging (`comfy/ldm/qwen_image/model.py`)

3. Extended Quantization Parameter Support (`comfy/ops.py`)

4. Model Converter (`comfy/svdquant_converter.py`)