Skip to content

Conversation

@LinjianMa
Copy link
Contributor

This PR introduces the reference implementation for DLDRMv3.
Instructions for running the benchmark are provided in the README. Please note: The dataset setup is still pending. The README will be updated with download instructions once the dataset becomes available.

@LinjianMa LinjianMa requested a review from a team as a code owner December 12, 2025 00:55
@github-actions
Copy link
Contributor

github-actions bot commented Dec 12, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@nvzhihanj
Copy link
Contributor

@tanvi-mlcommons @pgmpablo157321 to take a look (can ignore the whole HSTU submodules) and help resolve the CLA issues for Linjian.
@zihaok to help review as well

@LinjianMa
Copy link
Contributor Author

Hi @mrmhodak , could you help review this PR? All comments of @nvzhihanj have been addressed

```
python accuracy.py --path path/to/mlperf_log_accuracy.json
```
We use normalized entropy (NE), accuracy, and AUC as the metrics to evaluate the model quality. The accuracy for the reference implementation evaluated on 34,996 requests across 10 inference timestamps are listed below:
Copy link
Contributor

@nvzhihanj nvzhihanj Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using all 3 scores for 99%, or just the AUC? (If it's just AUC please help point it out)
Also it seems like the loadgen/mlperf.conf is missing (for latency threshold and sample count)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need all 3 scores to be 99% close, I will clarify in the README. I will also add latency threshold to mlperf.conf.

Regarding sample count, could you explain when will we use it? I'm a bit confused as I think performance_sample_count_override

# Set performance_sample_count for each model.
# User can optionally set this to higher values in user.conf.
resnet50.*.performance_sample_count_override = 1024
ssd-mobilenet.*.performance_sample_count_override = 256
retinanet.*.performance_sample_count_override = 64
bert.*.performance_sample_count_override = 10833
dlrm.*.performance_sample_count_override = 204800
dlrm-v2.*.performance_sample_count_override = 204800
rnnt.*.performance_sample_count_override = 2513
gptj.*.performance_sample_count_override = 13368
mixtral-8x7b.*.performance_sample_count_override = 15000
llama2-70b.*.performance_sample_count_override = 24576
llama2-70b-interactive.*.performance_sample_count_override = 24576
llama3_1-405b.*.performance_sample_count_override = 8313
llama3_1-405b-interactive.*.performance_sample_count_override = 8313
llama3_1-8b.*.performance_sample_count_override = 13368
llama3_1-8b-edge.*.performance_sample_count_override = 5000
llama3_1-8b-interactive.*.performance_sample_count_override = 13368
stable-diffusion-xl.*.performance_sample_count_override = 5000
rgat.*.performance_sample_count_override = 788379
pointpainting.*.performance_sample_count_override = 1024
deepseek-r1.*.performance_sample_count_override = 4388
whisper.*.performance_sample_count_override = 1633
# set to 0 to let entire sample set to be performance sample
3d-unet.*.performance_sample_count_override = 0
are never used. In particular, performance_sample_count at https://github.com/mlcommons/inference/blob/8999c4d686f6e4a180da14597c97063fce7c9f33/loadgen/test_settings_internal.cc#L123C3-L123C27 are only used when performance_issue_unique is True, but I don't see a case in existing benchmarks (and also DLRMv3) that we need this flag to be True.
My understanding is that submitters are allowed to change the sample count, and as long as the minimum duration requirement for the benchmark is met, this should be acceptable.

Copy link
Contributor Author

@LinjianMa LinjianMa Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the README and added latency to loadgen/mlperf.conf

@zihaok
Copy link

zihaok commented Jan 5, 2026

Hi @pgmpablo157321, can we get this merged? Thanks!

@nvzhihanj
Copy link
Contributor

@mrmhodak @hanyunfan FYI to merge

bert.Server.target_latency = 130
dlrm.Server.target_latency = 60
dlrm-v2.Server.target_latency = 60
dlrm-v3.Server.target_latency = 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cfRod , the target latency for dlrmv3 is 80ms. I think we need to update the submission checker

@hanyunfan
Copy link
Contributor

Add @arjunsuresh to help take a look at those MLC failing checks.

image

@arjunsuresh arjunsuresh merged commit 7a9fc82 into mlcommons:master Jan 6, 2026
32 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants