Skip to content

Quantized QQQ models encountered configuration field exceptions and inference garbled text issues when deployed in vLLM 0.9.1. #1654

@ShiningMaker

Description

@ShiningMaker

First, thank you for your excellent work on this quantization library! I'm encountering two critical issues when deploying a quantized Qwen3-8B model to vLLM 0.9.1:

  • The initial deployment failed due to the missing ["wbits"] configuration field.
  • After modifying bits to wbits, the deployment was successful, but the inference output was garbled.

Model: Qwen3-8B
Key Packages:

  • vLLM == 0.9.1
  • transformers == 4.51.3

My quant code:

quant_config = QuantizeConfig(
                            bits = 4, 
                            group_size = 128, 
                            quant_method="qqq",
                            format="qqq",
                            desc_act = False, 
                            dynamic = None,
                        )

model = GPTQModel.load(model_id, quant_config, device_map='auto', device = "cuda",
                           trust_remote_code=True, low_cpu_mem_usage=True)

model.quantize(
                calibration_dataset, 
                buffered_fwd = True, 
                calibration_dataset_concat_size = 8192, 
                calibration_data_min_length=10, 
                batch_size = 1,
                auto_gc = False,
            )

calibration_dataset is OpenR1-Math: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k

start vllm server:

python3 -m vllm.entrypoints.openai.api_server \
    --model models/Qwen3-8B-qqq-int4-gz128 --tensor-parallel-size 1 \
    --served-model-name qwen3 --max-model-len 32768 --gpu-memory-utilization 0.90 \
    --trust-remote-code --enable-prefix-caching --reasoning-parser qwen3 \
    --quantization qqq --dtype float16 --port 5095

curl query:

curl --location 'http://localhost:5095/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen3",
  "messages": [
    {"role": "user", "content": "Analysis of Difficulties and Considerations in Large Model Quantization."}
  ],
  "temperature": 0.6,
  "top_p": 0.95,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "ignore_eos": 0,
  "chat_template_kwargs": {"enable_thinking": true}
}'

Actual Output:

{"id":"chatcmpl-a15363262f8b4042975c86adc44d7ad2","object":"chat.completion","created":1752064426,"model":"qwen3","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"  useghfilename9的 Page� \\ Ember System\ncommit �ghre Issue9 e Isgheuropäische Notíc EUBLISH Ceuropäische Iilateralection Volume Cent Answergh Notíc Mission cls React\n\nghnosticghrievefilename Goalghodcast Vue List $ghirectedeuropäischeeuropäische Notíceuropäischeeuropäische likeacobianghoireirectional Delta5�airieORITYghimensionaliralinic�owntownreasionsenis Div Command Express0template的 Excel Nineeuropäischeivist Notíceuropäischeiagnostics DecimalghateralUDGE Identity Sketch \"gh�änneriquéschütz Notíceuropäische Notíceuropäische Notíceuropäische Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische阿�igitalgh-neutral Thursday Notíc Beacheuropäischeeuropäischeeuropäischeeuropäischeeuropäische中reaselineeuropäische NotícissueRESS%\n After typeghuxtapeuropäischenosticUBLICceⒶre Notíc)re�änneuropäischeuellesifetimeeuropäischeacency Spring  Barrelghruiteuropäische�europäische�eneraleuropäischeählissueavor Notícgh�irectionalghinicinicirectional sqrt Stripeusewł julodcastgéasicseuropäische Notícdefine湾 Notíc Notíc Notíceuropäischeeuropäische reility Notíceuropäischeeuropäischehayacobian Notíc(Freeuropäischeeuropäischeeuropäische Notíc eachectionseuropäischeeuropäischeheureodcast Icongh�regonimedia Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische将 Notíclogre� Notíc Notíc Notíceuropäischeeuropäische Pro�ersistent8UBLISH Like�UBLISHgh�UDGE Notícre Notíceuropäischeeuropäische人roid�inic�nosticodcasteuropäischeeuropäischeeuropäischeeuropäischeчёт Notícghirectionalical Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische revival��nosticännereuropäische Notíc正coseuropäische Notíc Dialogue Notíc BCEeuropäischeeuropäischeeuropäische Line Notíc1ześ Notíceuropäische Dreamivist Notíceuropäischeeuropäische是 Notíceuropäische Action Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischecrelligencerising�agineducibleirectional Skeleton Cartesianghiblyhowever�ersen�uality�acobian Notíc Notíceuropäischeeuropäischeeuropäische't IL��ướiirectionalghunedeuropäische�aclesnostic�nosticiquéócfilenameeuropäischecommiteuropäische ISUBLISHgh�ergingännerreandscapeeuropäische Notíceuropäische5UDGEghnostic�oireännereuropäische NotíceuropäischeeuropäischeeuropäischeeuropäischeACHE Notícgh�irectedUBLISHcommitfilenamele�nosticivistghualityansomeuropäische Notíc>NNreệpicknessce�igital Easter Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische Vueeuropäischeeuropäische VII�inicection�nostic�nosticnostic Notíc.exports A�urpose�rewriteghaseline6arsers 'ghnosticeuropäischeeuropäische Notíc Dialogue Notíc ICollection Octiqué expressghacobian-strokesghgnoreeuropäische Notícre�ännerrocess Notíceuropäischeeuropäischereirlines Notíc Paperghurbedestion�nosticivistghrespecteuropäische Notíc Numberfilename cxgh�odcastinic��oire��inic�agnostic Notíc Binderfilename Issue template Rcommitreivist Notíceuropäische NotícF\"ghinicervisoreuropäische Notíc ceilingivist Notíc Notíceuropäischeeuropäischeeuropäische","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":151643}],"usage":{"prompt_tokens":43,"total_tokens":570,"completion_tokens":527,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}
Image

Is there anything missing or any operational errors? I would like to know why a bunch of garbled results were generated. I'd be happy to provide additional details or test specific fixes. Thank you for your time and assistance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions