-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Closed
Labels
ci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CImulti-modalityRelated to multi-modality (#4194)Related to multi-modality (#4194)rocmRelated to AMD ROCmRelated to AMD ROCm
Description
Name of failing test
entrypoints/openai/test_transcription_validation.py::test_basic_audio[openai/whisper-large-v3-turbo]
Basic information
- Flaky test
- Can reproduce locally
- Caused by external libraries (e.g. bug in
transformers)
🧪 Describe the failing test
Summary
Encoder-decoder models (Whisper, T5, BART, vision-language models) fail on AMD ROCm with NotImplementedError because all ROCm-specific attention backends only support decoder-only models.
Therefore, all the tests using these models will fail on AMD CI(example), mainly in Entrypoints Integration Test (API Server) and Entrypoints Integration Test (Pooling).
Affected Models
- Speech-to-Text: Whisper family (
openai/whisper-large-v3-turbo,mistralai/Voxtral-Mini-3B-2507) - Vision-Language:
microsoft/Phi-3.5-vision-instruct - Translation: T5, BART, MarianMT models
- Any model with cross-attention: Encoder-decoder architectures
Failure Mode
Error:
NotImplementedError: Encoder self-attention and encoder/decoder cross-attention
are not implemented for TritonAttentionImpl
Root Cause: All ROCm-Specific Backends are Decoder-Only
| Backend | Location |
|---|---|
| TritonAttentionBackend(default) | vllm/v1/attention/backends/triton_attn.py |
| RocmAttentionBackend | vllm/v1/attention/backends/rocm_attn.py |
| RocmAiterFABackend | vllm/v1/attention/backends/rocm_aiter_fa.py |
| RocmAiterMLABackend | vllm/v1/attention/backends/mla/rocm_aiter_mla.py |
eg. TritonAttentionImpl:
if attn_type != AttentionType.DECODER:
raise NotImplementedError(
"Encoder self-attention and "
"encoder/decoder cross-attention "
"are not implemented for "
"TritonAttentionImpl"
)Purposal
- [short-term] Mark encoder-decoder tests with
@pytest.encoder_decoder, and skip them when using AMD. - [long-term] Support
flash_attnon ROCM, which is missing here - it will also require supporting kv bindings here.
📝 History of failing test
https://buildkite.com/vllm/ci/builds/35670#019a04ed-a2f7-4e7e-85ad-4e2118f76898
CC List.
@DarkLight1337 @LucasWilkinson @simon-mo @yeqcharlotte @Alexei-V-Ivanov-AMD @gshtras
Metadata
Metadata
Assignees
Labels
ci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CImulti-modalityRelated to multi-modality (#4194)Related to multi-modality (#4194)rocmRelated to AMD ROCmRelated to AMD ROCm
Type
Projects
Status
Done