-
Notifications
You must be signed in to change notification settings - Fork 566
Open
Description
When I run ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model.pth
, it ends up with error:RuntimeError: get_group_info: no group info associated with the group name
.
Detailed error information:
W0609 20:18:37.249000 1431440 torch/distributed/run.py:766] *****************************************
W0609 20:18:37.249000 1431440 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0609 20:18:37.249000 1431440 torch/distributed/run.py:766] *****************************************
Using device=cuda
Loading model ...
Applying tensor parallel to model ...
Time to load model: 10.60 seconds
/root/serve/gpt-fast/tp.py:139: FutureWarning: The combination of ranks + tag as process group identifier has been deprecated. Please switch to using ProcessGroup, DeviceMesh, or group name instead.
attn.register_forward_hook(lambda _module, _input, output: funcol.all_reduce(
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/serve/gpt-fast/generate.py", line 480, in <module>
[rank0]: main(
[rank0]: File "/root/serve/gpt-fast/generate.py", line 401, in main
[rank0]: y, metrics = generate(
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/root/serve/gpt-fast/generate.py", line 194, in generate
[rank0]: next_token = prefill(model, prompt.view(batch_size, -1), input_pos, **sampling_kwargs).clone()
[rank0]: File "/root/serve/gpt-fast/generate.py", line 71, in prefill
[rank0]: logits = model(mask, x, input_pos)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/serve/gpt-fast/model.py", line 156, in forward
[rank0]: x = layer(x, input_pos, freqs_cis, mask)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/root/serve/gpt-fast/model.py", line 175, in forward
[rank0]: h = x + self.attention(self.attention_norm(x), freqs_cis, mask, input_pos)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl
[rank0]: return inner()
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1818, in inner
[rank0]: hook_result = hook(self, args, result)
[rank0]: File "/root/serve/gpt-fast/tp.py", line 139, in <lambda>
[rank0]: attn.register_forward_hook(lambda _module, _input, output: funcol.all_reduce(
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/distributed/_functional_collectives.py", line 176, in all_reduce
[rank0]: tensor = torch.ops._c10d_functional.all_reduce(self, reduceOp.lower(), group_name)
[rank0]: File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/_ops.py", line 1158, in __call__
[rank0]: return self._op(*args, **(kwargs or {}))
[rank0]: RuntimeError: get_group_info: no group info associated with the group name
In UV venv,
torch version: 2.7.1+cu126
nband
Metadata
Metadata
Assignees
Labels
No labels