-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Open
Labels
Description
System Info
transformer 4.57.1
cuda 12.8
# 2. 額外載入模型的 Config
config = AutoConfig.from_pretrained(VLM_MODEL, trust_remote_code=True)
# 3. 建立一個「空的」模型骨架 (不載入權重)
# 這一步非常快,且不佔用 CPU RAM
with init_empty_weights():
empty_model = Qwen3VLForConditionalGeneration(config)
# (可選,但推薦) 確保權重一致性
empty_model.tie_weights()
max_memory_map = {
0: "9GiB",
1: "10.5GiB",
2: "10.5GiB"
}
# 4. 根據「空的」模型骨架和記憶體限制,計算 device_map
# 這就是您 ??? 的答案:
device_map = infer_auto_device_map(
empty_model, # <-- 填入空模型
max_memory=max_memory_map,
# 這是 Qwen-VL 的特殊要求,告訴 accelerate 哪些模組不可分割
no_split_module_classes=empty_model._no_split_modules
)
# (可選) 印出 device_map 檢查
print("[INFO] Inferred device map:")
print(device_map)
# 5. 載入模型,這次它會使用 device_map 來分散載入
model = Qwen3VLForConditionalGeneration.from_pretrained(
VLM_MODEL,
trust_remote_code=True,
device_map=device_map, # 傳入計算好的 map
torch_dtype=torch.float16,
attn_implementation="flash_attention_2",
low_cpu_mem_usage=True # 保持開啟
)```
import pynvml # type: ignore[import]
Traceback (most recent call last):
File "/media/pc14700k/ssd1tb/vlm/soiling/soiling_test_for_img_tag-hugging-face.py", line 764, in process_images
model = Qwen3VLForConditionalGeneration.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pc14700k/anaconda3/envs/vlm/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/pc14700k/anaconda3/envs/vlm/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5001, in from_pretrained
hf_quantizer.preprocess_model(
File "/home/pc14700k/anaconda3/envs/vlm/lib/python3.12/site-packages/transformers/quantizers/base.py", line 225, in preprocess_model
return self._process_model_before_weight_loading(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pc14700k/anaconda3/envs/vlm/lib/python3.12/site-packages/transformers/quantizers/quantizer_awq.py", line 119, in _process_model_before_weight_loading
model, has_been_replaced = replace_with_awq_linear(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pc14700k/anaconda3/envs/vlm/lib/python3.12/site-packages/transformers/integrations/awq.py", line 134, in replace_with_awq_linear
from awq.modules.linear.gemm import WQLinear_GEMM
ModuleNotFoundError: No module named 'awq.modules'; 'awq' is not a package
[ERROR] Failed to load model QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ: No module named 'awq.modules'; 'awq' is not a package
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [x] My own modified scripts
### Tasks
- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)
### Reproduction
use QuantTrio/Qwen3-VL-30B-A3B-Instruct-AWQ
use Qwen3-VL-30B-A3B-Instruct is OK
but use Qwen3-VL-30B-A3B-Instruct-AWQ got the problem log
### Expected behavior
support qwen3-vl awq