-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Description
First, thank you for your excellent work on this quantization library! I'm encountering two critical issues when deploying a quantized Qwen3-8B model to vLLM 0.9.1:
- The initial deployment failed due to the missing ["wbits"] configuration field.
- After modifying bits to wbits, the deployment was successful, but the inference output was garbled.
Model: Qwen3-8B
Key Packages:
- vLLM == 0.9.1
- transformers == 4.51.3
My quant code:
quant_config = QuantizeConfig(
bits = 4,
group_size = 128,
quant_method="qqq",
format="qqq",
desc_act = False,
dynamic = None,
)
model = GPTQModel.load(model_id, quant_config, device_map='auto', device = "cuda",
trust_remote_code=True, low_cpu_mem_usage=True)
model.quantize(
calibration_dataset,
buffered_fwd = True,
calibration_dataset_concat_size = 8192,
calibration_data_min_length=10,
batch_size = 1,
auto_gc = False,
)
calibration_dataset is OpenR1-Math: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
start vllm server:
python3 -m vllm.entrypoints.openai.api_server \
--model models/Qwen3-8B-qqq-int4-gz128 --tensor-parallel-size 1 \
--served-model-name qwen3 --max-model-len 32768 --gpu-memory-utilization 0.90 \
--trust-remote-code --enable-prefix-caching --reasoning-parser qwen3 \
--quantization qqq --dtype float16 --port 5095
curl query:
curl --location 'http://localhost:5095/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3",
"messages": [
{"role": "user", "content": "Analysis of Difficulties and Considerations in Large Model Quantization."}
],
"temperature": 0.6,
"top_p": 0.95,
"top_k": 20,
"max_tokens": 8192,
"presence_penalty": 1.5,
"ignore_eos": 0,
"chat_template_kwargs": {"enable_thinking": true}
}'
Actual Output:
{"id":"chatcmpl-a15363262f8b4042975c86adc44d7ad2","object":"chat.completion","created":1752064426,"model":"qwen3","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":" useghfilename9的 Page� \\ Ember System\ncommit �ghre Issue9 e Isgheuropäische Notíc EUBLISH Ceuropäische Iilateralection Volume Cent Answergh Notíc Mission cls React\n\nghnosticghrievefilename Goalghodcast Vue List $ghirectedeuropäischeeuropäische Notíceuropäischeeuropäische likeacobianghoireirectional Delta5�airieORITYghimensionaliralinic�owntownreasionsenis Div Command Express0template的 Excel Nineeuropäischeivist Notíceuropäischeiagnostics DecimalghateralUDGE Identity Sketch \"gh�änneriquéschütz Notíceuropäische Notíceuropäische Notíceuropäische Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische阿�igitalgh-neutral Thursday Notíc Beacheuropäischeeuropäischeeuropäischeeuropäischeeuropäische中reaselineeuropäische NotícissueRESS%\n After typeghuxtapeuropäischenosticUBLICceⒶre Notíc)re�änneuropäischeuellesifetimeeuropäischeacency Spring Barrelghruiteuropäische�europäische�eneraleuropäischeählissueavor Notícgh�irectionalghinicinicirectional sqrt Stripeusewł julodcastgéasicseuropäische Notícdefine湾 Notíc Notíc Notíceuropäischeeuropäische reility Notíceuropäischeeuropäischehayacobian Notíc(Freeuropäischeeuropäischeeuropäische Notíc eachectionseuropäischeeuropäischeheureodcast Icongh�regonimedia Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische将 Notíclogre� Notíc Notíc Notíceuropäischeeuropäische Pro�ersistent8UBLISH Like�UBLISHgh�UDGE Notícre Notíceuropäischeeuropäische人roid�inic�nosticodcasteuropäischeeuropäischeeuropäischeeuropäischeчёт Notícghirectionalical Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische revival��nosticännereuropäische Notíc正coseuropäische Notíc Dialogue Notíc BCEeuropäischeeuropäischeeuropäische Line Notíc1ześ Notíceuropäische Dreamivist Notíceuropäischeeuropäische是 Notíceuropäische Action Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischecrelligencerising�agineducibleirectional Skeleton Cartesianghiblyhowever�ersen�uality�acobian Notíc Notíceuropäischeeuropäischeeuropäische't IL��ướiirectionalghunedeuropäische�aclesnostic�nosticiquéócfilenameeuropäischecommiteuropäische ISUBLISHgh�ergingännerreandscapeeuropäische Notíceuropäische5UDGEghnostic�oireännereuropäische NotíceuropäischeeuropäischeeuropäischeeuropäischeACHE Notícgh�irectedUBLISHcommitfilenamele�nosticivistghualityansomeuropäische Notíc>NNreệpicknessce�igital Easter Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische Vueeuropäischeeuropäische VII�inicection�nostic�nosticnostic Notíc.exports A�urpose�rewriteghaseline6arsers 'ghnosticeuropäischeeuropäische Notíc Dialogue Notíc ICollection Octiqué expressghacobian-strokesghgnoreeuropäische Notícre�ännerrocess Notíceuropäischeeuropäischereirlines Notíc Paperghurbedestion�nosticivistghrespecteuropäische Notíc Numberfilename cxgh�odcastinic��oire��inic�agnostic Notíc Binderfilename Issue template Rcommitreivist Notíceuropäische NotícF\"ghinicervisoreuropäische Notíc ceilingivist Notíc Notíceuropäischeeuropäischeeuropäische","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":151643}],"usage":{"prompt_tokens":43,"total_tokens":570,"completion_tokens":527,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}

Is there anything missing or any operational errors? I would like to know why a bunch of garbled results were generated. I'd be happy to provide additional details or test specific fixes. Thank you for your time and assistance!
Metadata
Metadata
Assignees
Labels
No labels