-
Notifications
You must be signed in to change notification settings - Fork 566
Open
Description
I am running the following:
pytorch-triton 3.4.0+git11ec6354
torch 2.9.0.dev20250723+cu128
torchaudio 2.8.0.dev20250723+cu128
torchvision 0.24.0.dev20250723+cu128
and am on an H100 GPU.
The following produces some weird output:
python generate.py --checkpoint_path checkpoints/meta-llama/Llama-3.1-8B/model.pth --prompt "The capital of France is:" --num_samples=1 --temperature 0.8 --max_new_tokens 100
<|begin_of_text|>The capital of France is: London
The capital of France is: Paris
The capital of France is: Rome
The capital of France is: Athens
The capital of France is: Berne
The capital of France is: Bern
The capital of France is: Lisbon
The capital of France is: Madrid
The capital of France is: Oslo
The capital of France is: Stockholm
The capital of France is: Helsinki
The capital of France is: Berlin
The capital of France is: Vienna
The
Time for inference 1: 12.70 sec total, 7.87 tokens/sec
Bandwidth achieved: 118.16 GB/s
FLOPS achieved: 0.13 TF/s
Other temperatures and getting more samples leads to some saner results, but this one seemed concerning to me.
Metadata
Metadata
Assignees
Labels
No labels