Skip to content

[BUG] AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero' #1786

@aldakata

Description

@aldakata
  File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/torch.py", line 158, in forward                                      
    out = self._forward(x, out_shape)                                                                                                        
  File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/torch.py", line 164, in _forward                                     
    weights = self.dequantize_weight(num_itr=num_itr).to(x.dtype)                                                                            
              ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^                                                                                        
  File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/__init__.py", line 441, in dequantize_weight                         
    self.wf_unsqueeze_zero  # self.wf.unsqueeze(0),                                                                                          
    ^^^^^^^^^^^^^^^^^^^^^^                                                                                                                   
  File "/lustre/home/atatjer/src/envs/opensci/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__            
    raise AttributeError(                                                                                                                    
        f"'{type(self).__name__}' object has no attribute '{name}'"                                                                          
    )                                                                                                                                        
AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero'  

This bug appears when I quantize repo_origin=open-sci/open-sci-ref-v0.01-1.7b-nemotron-hq-1T-4096 or with allenai/OLMo-2-1124-7B, in fact, all that I have tried. This is how I quantize:

calibration_dataset = load_dataset(
            "allenai/c4",
            data_files="en/c4-train.00001-of-01024.json.gz",
            split="train"
        ).select(range(1024))["text"]

        quant_config = QuantizeConfig(
            bits=4,
            group_siz=128,
        )
        modelQ = GPTQModel.load(
            cfg.repo_origin, 
            quant_config, 
            revision='iter_0242000',
            trust_remote_code=True,
            torch_dtype='auto',
            device_map="auto",
            backend="torch",
            )
        logging.info(f"Quantizing {cfg.repo_origin}+{cfg.revision} to {cfg.q_bits} bits with group_size {cfg.group_size}")
        modelQ.quantize(
                calibration_dataset, 
                batch_size=16,
                auto_gc = False,
                backend="torch",
            )

It quantizes and saves properly, but when I do inference I get this bug.

However, when I load the quantized model (the one that just failed), I can do inference properly:

A clear and concise description of what the bug is.

GPU Info

Show output of:

nvidia-smi
```+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:C7:00.0 Off |                    0 |
| N/A   34C    P0              63W / 400W |      2MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

**Software Info**

Operation System/Version + Python Version

Show output of:

pip show gptqmodel torch transformers accelerate triton


Name: accelerate
Version: 1.10.1
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: gptqmodel, llmcompressor
---
Name: gptqmodel
Version: 4.2.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Editable project location: /home/atatjer/ext/GPTQModel
Requires: accelerate, device-smi, hf-transfer, huggingface-hub, logbar, maturin, numpy, packaging, pillow, protobuf, random-word, safetensors, threadpoolctl, tokenicer, torch, transformers, wheel
Required-by:
---
Name: torch
Version: 2.8.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, ai2-olmo, ai2-olmo-core, ai2-olmo-eval, compressed-tensors, gptqmodel, llmcompressor, optimum, torchaudio, torchmetrics, torchvision
---
Name: transformers
Version: 4.56.1
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: ai2-olmo, compressed-tensors, gptqmodel, llmcompressor, optimum, tokenicer
---
Name: triton
Version: 3.4.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: setuptools
Required-by: torch


**To Reproduce**

How to reproduce this bug if possible.



**Expected behavior**

model is quantized, and then can do inference

**Model/Datasets**

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

**Screenshots**

If applicable, add screenshots to help explain your problem.

**Additional context**

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions