Llama 3.1 70B size mismatch for tok_embeddings.weight

Hey, thanks for providing the gpt-fast project, 

I am getting an error when trying to run inference.

I have fine-tuned a llama-3.1-70B model with LoRa, using torchtune, converted the checkpoint following [these](https://pytorch.org/torchtune/main/tutorials/e2e_flow.html#using-torchtune-checkpoints-with-other-libraries) instructions and have run the below command from gpt-fast, specifying **llama-3.1-70b** as the model

```
python "./scripts/convert_hf_checkpoint.py" --checkpoint_dir "/home/ubuntu/projects/models/trained/llama_31_70b_instruct_gpt_fast/" --model "llama-3.1-70b"`
```

My directory looks like that:
```
ll -alht trained/llama_31_70b_instruct_gpt_fast/
total 263G
-rw-rw-r-- 1 ubuntu ubuntu 2.1M Dec  4 13:02 tokenizer.model
drwxrwxr-x 3 ubuntu ubuntu 4.0K Dec  4 13:02 ./
-rw-rw-r-- 1 ubuntu ubuntu 132G Dec  4 13:02 model.pth
-rw-rw-r-- 1 ubuntu ubuntu 2.0G Dec  4 10:05 hf_model_0030_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 10:05 hf_model_0029_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 10:04 hf_model_0028_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 10:02 hf_model_0027_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 10:01 hf_model_0026_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 10:00 hf_model_0025_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:59 hf_model_0024_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:58 hf_model_0023_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:56 hf_model_0022_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:55 hf_model_0021_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:54 hf_model_0020_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:53 hf_model_0019_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:52 hf_model_0018_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:50 hf_model_0017_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:49 hf_model_0016_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:48 hf_model_0015_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:47 hf_model_0014_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:46 hf_model_0013_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:45 hf_model_0012_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:43 hf_model_0011_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:42 hf_model_0010_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:41 hf_model_0009_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:40 hf_model_0008_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:39 hf_model_0007_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:38 hf_model_0006_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:36 hf_model_0005_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:35 hf_model_0004_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.7G Dec  4 09:34 hf_model_0003_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.4G Dec  4 09:33 hf_model_0002_0.pt
-rw-rw-r-- 1 ubuntu ubuntu 4.3G Dec  4 09:32 hf_model_0001_0.pt
-rw-rw-r-- 1 ubuntu ubuntu  46K Dec  3 14:55 pytorch_model.bin.index.json
drwxrwxr-x 5 ubuntu ubuntu 4.0K Dec  3 14:55 ../
drwxrwxr-x 2 ubuntu ubuntu 4.0K Nov 29 06:34 original/

```

Then when running this:
```
ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=8 \
        generate.py --compile \
        --checkpoint_path "/home/ubuntu/projects/models/trained/llama_31_70b_instruct_gpt_fast/model.pth"
```

I am getting the following error:
```
W1204 13:04:16.879000 31658 torch/distributed/run.py:792] 
W1204 13:04:16.879000 31658 torch/distributed/run.py:792] *****************************************
W1204 13:04:16.879000 31658 torch/distributed/run.py:792] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W1204 13:04:16.879000 31658 torch/distributed/run.py:792] *****************************************
Using device=cuda
Loading model ...
[rank6]: Traceback (most recent call last):
[rank6]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank6]:     main(
[rank6]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank6]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank6]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank6]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank6]:     model.load_state_dict(checkpoint, assign=True)
[rank6]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank6]:     raise RuntimeError(
[rank6]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank6]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank6]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank5]:     main(
[rank5]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank5]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank5]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank5]:     model.load_state_dict(checkpoint, assign=True)
[rank5]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank5]:     raise RuntimeError(
[rank5]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank5]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank5]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank0]:     main(
[rank0]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank0]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank0]:     model.load_state_dict(checkpoint, assign=True)
[rank0]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank0]:     raise RuntimeError(
[rank0]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank0]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank0]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank1]: Traceback (most recent call last):
[rank1]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank1]:     main(
[rank1]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank1]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank1]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank1]:     model.load_state_dict(checkpoint, assign=True)
[rank1]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank1]:     raise RuntimeError(
[rank1]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank1]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank1]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank2]: Traceback (most recent call last):
[rank2]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank2]:     main(
[rank2]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank2]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank2]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank2]:     model.load_state_dict(checkpoint, assign=True)
[rank2]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank2]:     raise RuntimeError(
[rank2]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank2]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank2]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank3]: Traceback (most recent call last):
[rank3]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank3]:     main(
[rank3]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank3]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank3]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank3]:     model.load_state_dict(checkpoint, assign=True)
[rank3]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank3]:     raise RuntimeError(
[rank3]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank3]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank3]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank4]:     main(
[rank4]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank4]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank4]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank4]:     model.load_state_dict(checkpoint, assign=True)
[rank4]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank4]:     raise RuntimeError(
[rank4]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank4]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank4]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank7]: Traceback (most recent call last):
[rank7]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 466, in <module>
[rank7]:     main(
[rank7]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 311, in main
[rank7]:     model = _load_model(checkpoint_path, device, precision, use_tp)
[rank7]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]:   File "/home/ubuntu/projects/gpt-fast/generate.py", line 241, in _load_model
[rank7]:     model.load_state_dict(checkpoint, assign=True)
[rank7]:   File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2583, in load_state_dict
[rank7]:     raise RuntimeError(
[rank7]: RuntimeError: Error(s) in loading state_dict for Transformer:
[rank7]: 	size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank7]: 	size mismatch for output.weight: copying a param with shape torch.Size([128256, 8192]) from checkpoint, the shape in current model is torch.Size([32000, 8192]).
[rank0]:[W1204 13:04:21.231147925 ProcessGroupNCCL.cpp:1437] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
W1204 13:04:21.890000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31730 closing signal SIGTERM
W1204 13:04:21.891000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31731 closing signal SIGTERM
W1204 13:04:21.892000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31732 closing signal SIGTERM
W1204 13:04:21.893000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31733 closing signal SIGTERM
W1204 13:04:21.894000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31734 closing signal SIGTERM
W1204 13:04:21.894000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31735 closing signal SIGTERM
W1204 13:04:21.895000 31658 torch/distributed/elastic/multiprocessing/api.py:897] Sending process 31737 closing signal SIGTERM
E1204 13:04:22.288000 31658 torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 6 (pid: 31736) of binary: /home/ubuntu/projects/models/venvg/bin/python
Traceback (most recent call last):
  File "/home/ubuntu/projects/models/venvg/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/distributed/run.py", line 918, in main
    run(args)
  File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/distributed/run.py", line 909, in run
    elastic_launch(
  File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/projects/models/venvg/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
generate.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-12-04_13:04:21
  host      : ip-172-31-12-154.us-west-2.compute.internal
  rank      : 6 (local_rank: 6)
  exitcode  : 1 (pid: 31736)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
```

Looks like the model has the **wrong layer sizes**, or maybe the tokenizer isn't loaded right?

I see [here](https://github.com/pytorch-labs/gpt-fast/blob/main/model.py#L76) that the `transformer_config` for `llama-3.1-70b` looks correct, but for some reason it's not loaded right. Should I copy also the config files from the original meta llama weights?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama 3.1 70B size mismatch for tok_embeddings.weight #215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama 3.1 70B size mismatch for tok_embeddings.weight #215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions