-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Labels
Description
Ideally we should enable cudagraph (triton.testing.do_bench_cudagraph
) by default. However right now a lot of operators will fail when cudagraph is on.
There are two common errors:
-
"torch.AcceleratorError: CUDA error: operation would make the legacy stream depend on a capturing blocking stream" (swiglu, softmax, etc)
-
"torch.AcceleratorError: CUDA error: operation failed due to a previous error during capture"
We should take a deeper look at these errors and understand if they can be fixed on the benchmark harness level.
For more details, check out https://github.com/meta-pytorch/tritonbench/actions/runs/17133354930/job/48603081591?pr=348.