Skip to content

Conversation

@lyogavin
Copy link

@lyogavin lyogavin commented Dec 2, 2025

Summary

This PR adds support for SVDQuant INT4 quantization as a new QuantizedLayout in ComfyUI, enabling faster inference and reduced VRAM usage.

Key Changes

1. New Quantization Layouts (comfy/quant_ops.py)

Added two new layout types to support SVDQuant and AWQ quantization formats:

  • SVDQuantLayout: Implements W4A4 quantization using SVD decomposition

    • Decomposes linear operations as: X*W = X * proj_up * proj_down + quantize(X) * quantize(R)
    • Supports both int4 and nvfp4 precision modes
    • Integrates with nunchaku CUDA kernels for optimized inference
  • AWQQuantLayout: Implements W4A16 quantization (Activation-aware Weight Quantization)

    • Keeps activations in 16-bit precision while quantizing weights to 4-bit
    • Uses nunchaku's awq_gemv_w4a16_cuda for efficient GEMM operations

2. Support Nunchaku-style QKV Merging (comfy/ldm/qwen_image/model.py)

In Nunchaku's SVDQuant quantization for QWen model they merged k_proj, q_proj and v_proj into one tensor qkv_proj. So we have to modify QWen model a little bit to support that.

Why we can't keep the q, k, v separated to keep it compatible as our previous code and model format:

    1. nunchaku SVDQuant needs data driven calibration, they prepared datasets to calibrate to generate this quantization model. And all the intermediate scales, smooth factors etc are mixed together, can't be simply splitted into q, k, v.
    1. there might be performance gain to merge q, k, v.

3. Extended Quantization Parameter Support (comfy/ops.py)

  • Extended mixed_precision_ops to handle additional quantization parameters:
    • added custom_layer_params_keys to support more complicated quantization format for SVDQuant-specific tensors (wscales, smooth_factor, proj_down, proj_up, etc.)
    • Proper handling of QuantizedTensor in cast_bias_weight for dtype detection

4. Model Converter (comfy/svdquant_converter.py)

Added converter to transform nunchaku-style checkpoints to ComfyUI's QuantizedLayout format.

Performance Comparison

Tested with this workflow: svdq_qwen_test_workflow.json

Tested on NVIDIA RTX 4090 (24GB VRAM) with Qwen-Image model:

Model Runtime (1st / 2nd+) Max VRAM
qwen_image_fp8_e4m3fn.safetensors 75.68s / 56.52s 21,333 MB
qwen_image_int4_svdq.safetensors 42s / 31s 13,205 MB

Improvements:

  • ~45% faster inference (31s vs 56.52s)
  • 💾 ~38% less VRAM (13.2GB vs 21.3GB)

Generated image:

By qwen_image_fp8_e4m3fn:
ComfyUI_00106_

By qwen_image_int4_svdq:

ComfyUI_00105_

Pre-converted Model

A converted model is available on Hugging Face:

🤗 lyogavin/QWen-Image_ComfyUI_SVDQ

Testing

  • Unit tests included for converter functions
  • Tested with ComfyUI workflow to verify end-to-end functionality

Dependencies

nunchaku library installation instruction here.

@lyogavin lyogavin requested a review from guill as a code owner December 2, 2025 16:05
@Kosinkadink Kosinkadink added the Core Core team dependency label Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Core team dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants