-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel #28306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jvlunteren
wants to merge
28
commits into
vllm-project:main
Choose a base branch
from
jvlunteren:jvl-triton-attn-upd2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+140
−40
Open
Changes from 15 commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
8126fa1
remove prefill support from 3d kernel
jvlunteren 4a54c08
formatting
jvlunteren f1f58cc
Merge branch 'main' into jvl-triton-attn-upd1
jvlunteren 84c5cd7
adapt 3D kernel for full CUDA Graph support
jvlunteren 39f52b4
formatting
jvlunteren 3102959
update unit test
jvlunteren 9bcc1fb
corrected comment
jvlunteren f3fdb32
Merge branch 'main' into jvl-triton-attn-upd2
jvlunteren 53d7b8b
added check for empty cudagraph_capture_sizes
jvlunteren a70bf68
allocate softmax buffers with padded head dimension
jvlunteren 96576d8
Merge branch 'main' into jvl-triton-attn-upd2
jvlunteren a62aa11
fix failing ruff check
jvlunteren 5d4921f
Merge branch 'main' into jvl-triton-attn-upd2
jvlunteren 5a4173f
Merge branch 'vllm-project:main' into jvl-triton-attn-upd2
jvlunteren 90e746a
remove dependencies on other PRs
jvlunteren 5f67875
use math utility for computing next power of 2
jvlunteren 721b319
add comment to explain threshold computation
jvlunteren acf43b8
use next_power_of_2 from vllm.utils.math_utils
jvlunteren e1b0a81
add assert to ensure capture sizes are set for CUDA Graphs
jvlunteren c214d7e
Merge branch 'main' into jvl-triton-attn-upd2
jvlunteren b0d42fa
remove superfluous check
jvlunteren c9a9aee
Update vllm/v1/attention/backends/triton_attn.py
jvlunteren 92ea4a4
made new unified_attention() arguments optional to preserve backward …
jvlunteren c5e317c
updated comment
jvlunteren 924d36e
make additonal new argument optional
jvlunteren 633a319
Merge branch 'main' into jvl-triton-attn-upd2
jvlunteren 2112153
bugfix and modification
jvlunteren 741cf4e
Merge branch 'main' into jvl-triton-attn-upd2
tdoublep File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.