Skip to content

Conversation

@huydhn
Copy link
Contributor

@huydhn huydhn commented Aug 6, 2025

On 8xH100 and 8xB200, here are the current numbers for the two models:

gpt-oss-20b

+-----------------+-------+-------+--------+
|                 | b200  | h100  | mi325x |
+-----------------+-------+-------+--------+
| gpqa (low)      | 0.568 | 0.580 | n/a    |
| gpqa (medium)   | 0.662 | 0.670 | n/a    |
| gpqa (high)     | 0.729 | 0.735 | n/a    |
+-----------------+-------+-------+--------+
| aime25 (low)    | 0.333 | 0.342 | n/a    |
| aime25 (medium) | 0.717 | 0.737 | n/a    |
| aime25 (high)   | 0.858 | 0.842 | n/a    |
+-----------------+-------+-------+--------+

gpt-oss-20b perf numbers

gpt-oss-120b

+-----------------+-------+-------+--------+
|                 | b200  | h100  | mi325x |
+-----------------+-------+-------+--------+
| gpqa (low)      | 0.653 | 0.654 | n/a    |
| gpqa (medium)   | 0.718 | 0.713 | n/a    |
| gpqa (high)     | 0.785 | 0.786 | n/a    |
+-----------------+-------+-------+--------+
| aime25 (low)    | 0.512 | 0.500 | n/a    |
| aime25 (medium) | 0.754 | 0.758 | n/a    |
| aime25 (high)   | 0.883 | 0.921 | n/a    |
+-----------------+-------+-------+--------+

gpt-oss-120b perf numbers

Overall, the accuracy scores are on the same par as https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#accuracy-evaluation-panels for H100, but aime25 results on B200 looks like there are issues there.

Here are the links to download the raw results:

huydhn added 3 commits August 6, 2025 00:37
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@meta-cla meta-cla bot added the cla signed label Aug 6, 2025
@huydhn huydhn requested a deployment to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions In progress
Signed-off-by: Huy Do <huydhn@gmail.com>
# Not sure why this is needed on ROCm
pushd gpt_oss
# Low
OPENAI_API_KEY="" python3 -mevals --base-url http://localhost:8000/v1 \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this eval library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just gpt_oss.evals calling inside gpt_oss directory. I couldn't under stand why I need to do that on ROCm and don't have ssh access to the runners to check why (they are on AMD side). My ROCm devgpu doesn't have Docker to try this out locally

Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn
Copy link
Contributor Author

huydhn commented Sep 4, 2025

#74 has been landed to add gpt-oss to the OSS benchmark dashboard

@huydhn huydhn closed this Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants