[BUG]  AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero'

```
  File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/torch.py", line 158, in forward                                      
    out = self._forward(x, out_shape)                                                                                                        
  File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/torch.py", line 164, in _forward                                     
    weights = self.dequantize_weight(num_itr=num_itr).to(x.dtype)                                                                            
              ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^                                                                                        
  File "/lustre/home/atatjer/ext/GPTQModel/gptqmodel/nn_modules/qlinear/__init__.py", line 441, in dequantize_weight                         
    self.wf_unsqueeze_zero  # self.wf.unsqueeze(0),                                                                                          
    ^^^^^^^^^^^^^^^^^^^^^^                                                                                                                   
  File "/lustre/home/atatjer/src/envs/opensci/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__            
    raise AttributeError(                                                                                                                    
        f"'{type(self).__name__}' object has no attribute '{name}'"                                                                          
    )                                                                                                                                        
AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero'  
```

This bug appears when I quantize  `repo_origin=open-sci/open-sci-ref-v0.01-1.7b-nemotron-hq-1T-4096` or with `allenai/OLMo-2-1124-7B`, in fact, all that I have tried. This is how I quantize:         
```
calibration_dataset = load_dataset(
            "allenai/c4",
            data_files="en/c4-train.00001-of-01024.json.gz",
            split="train"
        ).select(range(1024))["text"]

        quant_config = QuantizeConfig(
            bits=4,
            group_siz=128,
        )
        modelQ = GPTQModel.load(
            cfg.repo_origin, 
            quant_config, 
            revision='iter_0242000',
            trust_remote_code=True,
            torch_dtype='auto',
            device_map="auto",
            backend="torch",
            )
        logging.info(f"Quantizing {cfg.repo_origin}+{cfg.revision} to {cfg.q_bits} bits with group_size {cfg.group_size}")
        modelQ.quantize(
                calibration_dataset, 
                batch_size=16,
                auto_gc = False,
                backend="torch",
            )
```
It quantizes and saves properly, but when I do inference I get this bug.

However, when I load the quantized model (the one that just failed), I can do inference properly:

A clear and concise description of what the bug is.

**GPU Info**

Show output of:

```
nvidia-smi
```+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:C7:00.0 Off |                    0 |
| N/A   34C    P0              63W / 400W |      2MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

**Software Info**

Operation System/Version + Python Version

Show output of:
```
pip show gptqmodel torch transformers accelerate triton
```

Name: accelerate
Version: 1.10.1
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: gptqmodel, llmcompressor
---
Name: gptqmodel
Version: 4.2.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Editable project location: /home/atatjer/ext/GPTQModel
Requires: accelerate, device-smi, hf-transfer, huggingface-hub, logbar, maturin, numpy, packaging, pillow, protobuf, random-word, safetensors, threadpoolctl, tokenicer, torch, transformers, wheel
Required-by:
---
Name: torch
Version: 2.8.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-cufile-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-cusparselt-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, ai2-olmo, ai2-olmo-core, ai2-olmo-eval, compressed-tensors, gptqmodel, llmcompressor, optimum, torchaudio, torchmetrics, torchvision
---
Name: transformers
Version: 4.56.1
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: ai2-olmo, compressed-tensors, gptqmodel, llmcompressor, optimum, tokenicer
---
Name: triton
Version: 3.4.0
Location: /lustre/home/atatjer/src/scalinglawsquantization/.venv/lib/python3.10/site-packages
Requires: setuptools
Required-by: torch


**To Reproduce**

How to reproduce this bug if possible.



**Expected behavior**

model is quantized, and then can do inference

**Model/Datasets**

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

**Screenshots**

If applicable, add screenshots to help explain your problem.

**Additional context**

Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero' #1786

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] AttributeError: 'TorchQuantLinear' object has no attribute 'wf_unsqueeze_zero' #1786

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions