[benchmarks][vllm] Unified Attention benchmark (paged attention) #5348

Egor-Krivov · 2025-10-20T15:20:39Z

I also started reporting gbps to the database because many benchmarks are memory bound

.github/workflows/third-party-benchmarks.yml

benchmarks/third_party/vllm/transform_results.py

benchmarks/third_party/vllm/unified_attention_benchmark.py

Copilot

Pull Request Overview

This PR adds a paged attention benchmark for the vLLM library, implementing both 2D and 3D unified attention kernels with tensor descriptor optimizations. The benchmark compares performance against PyTorch reference implementations and reports both throughput (GB/s) and compute (TFlops) metrics.

Key changes:

Implementation of unified attention benchmark with paged KV cache support
Enhanced memory bandwidth calculations accounting for actual token usage
Extended result transformation to report GB/s metrics alongside TFlops
CI/CD workflow updates to run the new benchmark

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
benchmarks/third_party/vllm/unified_attention_benchmark.py	New comprehensive benchmark for vLLM's unified attention with 2D/3D kernels, supporting various model configurations and attention features
benchmarks/third_party/vllm/transform_results.py	Enhanced to handle non-integer parameter values and report both TFlops and GB/s metrics
benchmarks/third_party/vllm/batched_moe_benchmark.py	Improved memory bandwidth calculation to account for actual activated experts and token usage
.github/workflows/third-party-benchmarks.yml	Added unified attention benchmark to CI workflow and improved command formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

benchmarks/third_party/vllm/unified_attention_benchmark.py

Egor-Krivov added 11 commits October 20, 2025 15:18

Benchmark POC

b0f98c1

Merge remote-tracking branch 'origin/main' into egor/vllm_attn

8bebe5d

Merge remote-tracking branch 'origin/main' into egor/vllm_attn

ea57e4f

Merge remote-tracking branch 'origin/main' into egor/vllm_attn

3c63755

Add some comments original version

a816892

First version

40e7d13

Improved perf

9f9e622

3d version

6ccb444

Fixed data processing

44ae08f

Fixed gbps calculation

b8bcc56

Cleaned up the benchmark

705aae1

Egor-Krivov changed the title ~~[vllm][benchmarks][WIP] Attention benchmark~~ [benchmarks][vllm] Paged attention benchmark Nov 12, 2025

Egor-Krivov marked this pull request as ready for review November 12, 2025 14:38

Fixed a bug

9f5d320

etiotto requested review from anmyachev and vlad-penkin and removed request for vlad-penkin November 13, 2025 14:57

Egor-Krivov added 3 commits November 13, 2025 16:16

Fixed path

8552daa

Fixed arguments, moved to gbps metrics

b47c269

Fix

5f691a2

anmyachev approved these changes Nov 14, 2025

View reviewed changes

.github/workflows/third-party-benchmarks.yml Show resolved Hide resolved

benchmarks/third_party/vllm/transform_results.py Show resolved Hide resolved

benchmarks/third_party/vllm/unified_attention_benchmark.py Show resolved Hide resolved

Egor-Krivov added 5 commits November 14, 2025 13:17

Clean up noisy comments

fdb2e7d

Separated vllm install

80f56e9

Enable llama 70B config

bb57fcd

Report both tflops and gbps

61c2112

Better brackets

dd5a6b9

Egor-Krivov requested a review from Copilot November 14, 2025 14:12

Copilot AI reviewed Nov 14, 2025

View reviewed changes

benchmarks/third_party/vllm/unified_attention_benchmark.py Show resolved Hide resolved

benchmarks/third_party/vllm/unified_attention_benchmark.py Show resolved Hide resolved

Egor-Krivov enabled auto-merge (squash) November 14, 2025 15:29

Egor-Krivov changed the title ~~[benchmarks][vllm] Paged attention benchmark~~ [benchmarks][vllm] Unified Attention benchmark (paged attention) Nov 14, 2025

Egor-Krivov merged commit 43d761e into main Nov 14, 2025
23 of 25 checks passed

Egor-Krivov deleted the egor/vllm_attn branch November 14, 2025 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmarks][vllm] Unified Attention benchmark (paged attention) #5348

[benchmarks][vllm] Unified Attention benchmark (paged attention) #5348

Egor-Krivov commented Oct 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[benchmarks][vllm] Unified Attention benchmark (paged attention) #5348

[benchmarks][vllm] Unified Attention benchmark (paged attention) #5348

Conversation

Egor-Krivov commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Egor-Krivov commented Oct 20, 2025 •

edited

Loading