-
Couldn't load subscription status.
- Fork 20
Run an one-off benchmark for gpt-oss #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
| # Not sure why this is needed on ROCm | ||
| pushd gpt_oss | ||
| # Low | ||
| OPENAI_API_KEY="" python3 -mevals --base-url http://localhost:8000/v1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this eval library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just gpt_oss.evals calling inside gpt_oss directory. I couldn't under stand why I need to do that on ROCm and don't have ssh access to the runners to check why (they are on AMD side). My ROCm devgpu doesn't have Docker to try this out locally
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
|
#74 has been landed to add gpt-oss to the OSS benchmark dashboard |
On 8xH100 and 8xB200, here are the current numbers for the two models:
gpt-oss-20b
gpt-oss-20b perf numbers
gpt-oss-120b
gpt-oss-120b perf numbers
Overall, the accuracy scores are on the same par as https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#accuracy-evaluation-panels for H100, but
aime25results on B200 looks like there are issues there.Here are the links to download the raw results:
gpqaaccuracy https://github.com/pytorch/pytorch-integration-testing/actions/runs/16823595241aime25accuracy https://github.com/pytorch/pytorch-integration-testing/actions/runs/16839593742