-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Labels
bugSomething isn't workingSomething isn't working
Description
File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/torch.py", line 158, in forward
out = self._forward(x, out_shape)
File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/torch.py", line 164, in _forward
weights = self.dequantize_weight(num_itr=num_itr).to(x.dtype)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/__init__.py", line 441, in dequantize_weight
self.wf_unsqueeze_zero # self.wf.unsqueeze(0),
^^^^^^^^^^^^^^^^^^^^^^
File "/lustre/home/atatjer/src/envs/opensci/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
raise AttributeError(
f"'{type(self).__name__}' object has no attribute '{name}'"
)
AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero'
This bug appears when I quantize repo_origin=open-sci/open-sci-ref-v0.01-1.7b-nemotron-hq-1T-4096
or with allenai/OLMo-2-1124-7B
, in fact, all that I have tried. This is how I quantize:
calibration_dataset = load_dataset(
"allenai/c4",
data_files="en/c4-train.00001-of-01024.json.gz",
split="train"
).select(range(1024))["text"]
quant_config = QuantizeConfig(
bits=4,
group_siz=128,
)
modelQ = GPTQModel.load(
cfg.repo_origin,
quant_config,
revision='iter_0242000',
trust_remote_code=True,
torch_dtype='auto',
device_map="auto",
backend="torch",
)
logging.info(f"Quantizing {cfg.repo_origin}+{cfg.revision} to {cfg.q_bits} bits with group_size {cfg.group_size}")
modelQ.quantize(
calibration_dataset,
batch_size=16,
auto_gc = False,
backend="torch",
)
It quantizes and saves properly, but when I do inference I get this bug.
However, when I load the quantized model (the one that just failed), I can do inference properly:
A clear and concise description of what the bug is.
GPU Info
Show output of:
nvidia-smi
```+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:C7:00.0 Off | 0 |
| N/A 34C P0 63W / 400W | 2MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
**Software Info**
Operation System/Version + Python Version
Show output of:
pip show gptqmodel torch transformers accelerate triton
Name: accelerate
Version: 1.10.1
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: gptqmodel, llmcompressor
---
Name: gptqmodel
Version: 4.2.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Editable project location: /home/atatjer/ext/GPTQModel
Requires: accelerate, device-smi, hf-transfer, huggingface-hub, logbar, maturin, numpy, packaging, pillow, protobuf, random-word, safetensors, threadpoolctl, tokenicer, torch, transformers, wheel
Required-by:
---
Name: torch
Version: 2.8.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, ai2-olmo, ai2-olmo-core, ai2-olmo-eval, compressed-tensors, gptqmodel, llmcompressor, optimum, torchaudio, torchmetrics, torchvision
---
Name: transformers
Version: 4.56.1
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: ai2-olmo, compressed-tensors, gptqmodel, llmcompressor, optimum, tokenicer
---
Name: triton
Version: 3.4.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: setuptools
Required-by: torch
**To Reproduce**
How to reproduce this bug if possible.
**Expected behavior**
model is quantized, and then can do inference
**Model/Datasets**
Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Additional context**
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working