[Bug]: encoder_decoder models (e.g. Whisper) is not working in vLLM 0.11 with ROCm

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code>

[rocm/vllm-dev:nightly](https://hub.docker.com/layers/rocm/vllm-dev/nightly_main_20251105/images/sha256-49ce35bb2c053b5ab76b955453aa7d651afe9b4fbef895183ef5299ba7935007)

</summary>

```text
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version                : 20.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.0.0 25314 f4087f6b428f0e6f575ebac8a8a724dab123d06e)
CMake version                : version 3.31.6
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0a0+git1c57644
Is debug build               : False
CUDA used to build PyTorch   : N/A
ROCM used to build PyTorch   : 7.0.51831-a3e329ad8

==============================
      Python Environment
==============================
Python version               : 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-6.14.0-33-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : 
GPU models and configuration :  (gfx1101)
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : 7.0.51831
MIOpen runtime version       : 3.5.0
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  12
On-line CPU(s) list:                     0-11
Vendor ID:                               AuthenticAMD
BIOS Vendor ID:                          Advanced Micro Devices, Inc.
Model name:                              AMD Ryzen 5 5600 6-Core Processor
BIOS Model name:                         AMD Ryzen 5 5600 6-Core Processor              
CPU family:                              25
Model:                                   33
Thread(s) per core:                      2
Core(s) per socket:                      6
Socket(s):                               1
Stepping:                                2
Frequency boost:                         enabled
CPU max MHz:                             4470.0000
CPU min MHz:                             550.0000
BogoMIPS:                                7000.46
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization:                          AMD-V
L1d cache:                               192 KiB (6 instances)
L1i cache:                               192 KiB (6 instances)
L2 cache:                                3 MiB (6 instances)
L3 cache:                                32 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-11
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsx async abort:           Not affected

==============================
Versions of relevant libraries
==============================
[pip3] conch-triton-kernels==1.2.1
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0a0+git1c57644
[pip3] torchvision==0.23.0a0+824e8c8
[pip3] transformers==4.57.1
[pip3] triton==3.4.0
[pip3] triton_kernels==1.0.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : 7.0.51831-a3e329ad8
vLLM Version                 : 0.11.1rc6.dev141+g0b8e871e5 (git sha: 0b8e871e5)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  ============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
       GPU0         
GPU0   0            

================================= Hops between two GPUs ==================================
       GPU0         
GPU0   0            

=============================== Link Type between two GPUs ===============================
       GPU0         
GPU0   0            

======================================= Numa Nodes =======================================
GPU[0]          : (Topology) Numa Node: 0
GPU[0]          : (Topology) Numa Affinity: -1
================================== End of ROCm SMI Log ===================================

==============================
     Environment Variables
==============================
PYTORCH_TUNABLEOP_TUNING=0
PYTORCH_TUNABLEOP_ENABLED=1
PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100;gfx1101;gfx1200;gfx1201;gfx1150;gfx1151
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
VLLM_LOGGING_LEVEL=DEBUG
PYTORCH_TUNABLEOP_FILENAME=/app/afo_tune_device_%d_full.csv
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
```

</details>

<details>
<summary>The output of <code>python collect_env.py</code>

[rocm/vllm:rocm6.4.1_vllm_0.10.1](https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.10.1_20250909/images/sha256-1113268572e26d59b205792047bea0e61e018e79aeadceba118b7bf23cb3715c)

</summary>

```text
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.1 25184 c87081df219c42dc27c5b6d86c0525bc7d01f727)
CMake version                : version 3.31.6
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0+gitf717b2a
Is debug build               : False
CUDA used to build PyTorch   : N/A
ROCM used to build PyTorch   : 6.4.43483-a187df25c

==============================
      Python Environment
==============================
Python version               : 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-6.14.0-33-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : AMD Radeon RX 7700 XT (gfx1101)
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : 6.4.43483
MIOpen runtime version       : 3.4.0
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  12
On-line CPU(s) list:                     0-11
Vendor ID:                               AuthenticAMD
BIOS Vendor ID:                          Advanced Micro Devices, Inc.
Model name:                              AMD Ryzen 5 5600 6-Core Processor
BIOS Model name:                         AMD Ryzen 5 5600 6-Core Processor              
CPU family:                              25
Model:                                   33
Thread(s) per core:                      2
Core(s) per socket:                      6
Socket(s):                               1
Stepping:                                2
Frequency boost:                         enabled
CPU max MHz:                             4470.0000
CPU min MHz:                             550.0000
BogoMIPS:                                7000.46
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization:                          AMD-V
L1d cache:                               192 KiB (6 instances)
L1i cache:                               192 KiB (6 instances)
L2 cache:                                3 MiB (6 instances)
L3 cache:                                32 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-11
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsx async abort:           Not affected

==============================
Versions of relevant libraries
==============================
[pip3] conch-triton-kernels==1.2.1
[pip3] numpy==2.2.6
[pip3] pyzmq==27.0.2
[pip3] torch==2.7.0+gitf717b2a
[pip3] torchvision==0.21.0+7af6987
[pip3] transformers==4.56.1
[pip3] triton==3.2.0+gite5be006a
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : 6.4.43483-a187df25c
vLLM Version                 : 0.10.1rc2.dev410+g6663000a3 (git sha: 6663000a3)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  ============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
       GPU0         
GPU0   0            

================================= Hops between two GPUs ==================================
       GPU0         
GPU0   0            

=============================== Link Type between two GPUs ===============================
       GPU0         
GPU0   0            

======================================= Numa Nodes =======================================
GPU[0]          : (Topology) Numa Node: 0
GPU[0]          : (Topology) Numa Affinity: -1
================================== End of ROCm SMI Log ===================================

==============================
     Environment Variables
==============================
PYTORCH_TUNABLEOP_TUNING=0
PYTORCH_TUNABLEOP_ENABLED=1
PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx1100;gfx1101;gfx1200;gfx1201
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
VLLM_LOGGING_LEVEL=DEBUG
PYTORCH_TUNABLEOP_FILENAME=/app/afo_tune_device_%d_full.csv
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
```

</details>


### 🐛 Describe the bug

## Description:

I downloaded the [vllm/rocm-dev](https://hub.docker.com/layers/rocm/vllm-dev/nightly_main_20251105/images/sha256-49ce35bb2c053b5ab76b955453aa7d651afe9b4fbef895183ef5299ba7935007) image version 0.11.1_20251105 with ROCm 7 to test if Whisper could perform timestamp transcription with segments in this version, since this functionality was not available in the version I was using before [(0.10.1)](https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.10.1_20250909/images/sha256-1113268572e26d59b205792047bea0e61e018e79aeadceba118b7bf23cb3715c).

## Then, when i was running the container, this error occurred:

```text
Attaching to vllm_container
vllm_container  | DEBUG 11-06 05:01:05 [plugins/__init__.py:32] No plugins for group vllm.platform_plugins found.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:36] Checking if TPU platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:55] TPU platform is not available because: No module named 'libtpu'
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:61] Checking if CUDA platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:88] Exception happens when checking CUDA platform: NVML Shared Library Not Found
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:105] CUDA platform is not available because: NVML Shared Library Not Found
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:133] Checking if XPU platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:153] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:160] Checking if CPU platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:05 [platforms/__init__.py:225] Automatically detected platform rocm.
vllm_container  | DEBUG 11-06 05:01:08 [utils/flashinfer.py:44] FlashInfer unavailable since package was not found
vllm_container  | DEBUG 11-06 05:01:10 [entrypoints/utils.py:175] Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'
vllm_container  | DEBUG 11-06 05:01:10 [plugins/__init__.py:40] Available plugins for group vllm.general_plugins:
vllm_container  | DEBUG 11-06 05:01:10 [plugins/__init__.py:42] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
vllm_container  | DEBUG 11-06 05:01:10 [plugins/__init__.py:45] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
vllm_container  | DEBUG 11-06 05:01:10 [config/parallel.py:595] Disabled the custom all-reduce kernel because it is not supported on current platform.
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:10 [entrypoints/openai/api_server.py:1961] vLLM API server version 0.11.1rc6.dev141+g0b8e871e5
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:10 [entrypoints/utils.py:253] non-default args: {'model_tag': 'openai/whisper-tiny', 'host': '0.0.0.0', 'model': 'openai/whisper-tiny'}
config.json: 1.98kB [00:00, 16.5MB/s]
preprocessor_config.json: 185kB [00:00, 50.9MB/s]
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:12 [model_executor/models/registry.py:598] Cached model info file for class vllm.model_executor.models.whisper.WhisperForConditionalGeneration not found
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:12 [model_executor/models/registry.py:658] Cache model info for class vllm.model_executor.models.whisper.WhisperForConditionalGeneration miss. Loading model instead.
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:17 [model_executor/models/registry.py:668] Loaded model info for class vllm.model_executor.models.whisper.WhisperForConditionalGeneration
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:17 [logging_utils/log_time.py:29] Registry inspect model class: Elapsed time 5.1163223 secs
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:17 [config/model.py:630] Resolved architecture: WhisperForConditionalGeneration
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:17 [config/model.py:1951] Downcasting torch.float32 to torch.bfloat16.
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:17 [config/model.py:1728] Using max model len 448
vllm_container  | (APIServer pid=1) WARNING 11-06 05:01:17 [config/model.py:1067] CUDA graph is not supported for whisper on ROCm yet, fallback to eager mode.
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:17 [engine/arg_utils.py:1933] Setting max_num_batched_tokens to 2048 for OPENAI_API_SERVER usage context.
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:17 [engine/arg_utils.py:1945] Setting max_num_seqs to 256 for OPENAI_API_SERVER usage context.
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:17 [config/parallel.py:595] Disabled the custom all-reduce kernel because it is not supported on current platform.
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:17 [config/scheduler.py:186] Encoder-decoder models do not support chunked prefill nor prefix caching; disabling both.
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:17 [config/parallel.py:595] Disabled the custom all-reduce kernel because it is not supported on current platform.
vllm_container  | (APIServer pid=1) INFO 11-06 05:01:17 [config/vllm.py:453] Cudagraph is disabled under eager mode
tokenizer_config.json: 283kB [00:00, 74.3MB/s]
vocab.json: 836kB [00:00, 8.56MB/s] 
tokenizer.json: 2.48MB [00:00, 22.0MB/s]
merges.txt: 494kB [00:00, 16.6MB/s] 
normalizer.json: 52.7kB [00:00, 26.1MB/s]
added_tokens.json: 34.6kB [00:00, 22.4MB/s]
special_tokens_map.json: 2.19kB [00:00, 4.21MB/s]
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:25 [config/vllm.py:503] Encoder-decoder model detected: setting `max_num_encoder_input_tokens` to encoder length (1500)
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:25 [plugins/__init__.py:32] No plugins for group vllm.stat_logger_plugins found.
generation_config.json: 3.75kB [00:00, 3.97MB/s]
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:25 [plugins/io_processors/__init__.py:33] No IOProcessor plugins requested by the model
vllm_container  | DEBUG 11-06 05:01:27 [plugins/__init__.py:32] No plugins for group vllm.platform_plugins found.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:36] Checking if TPU platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:55] TPU platform is not available because: No module named 'libtpu'
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:61] Checking if CUDA platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:88] Exception happens when checking CUDA platform: NVML Shared Library Not Found
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:105] CUDA platform is not available because: NVML Shared Library Not Found
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:133] Checking if XPU platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:153] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:160] Checking if CPU platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container  | DEBUG 11-06 05:01:27 [platforms/__init__.py:225] Automatically detected platform rocm.
vllm_container  | DEBUG 11-06 05:01:29 [utils/flashinfer.py:44] FlashInfer unavailable since package was not found
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [v1/engine/core.py:780] Waiting for init message from front-end.
vllm_container  | (APIServer pid=1) DEBUG 11-06 05:01:31 [v1/engine/utils.py:1058] HELLO from local core engine process 0.
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [v1/engine/core.py:791] Received init message: EngineHandshakeMetadata(addresses=EngineZmqAddresses(inputs=['ipc:///tmp/bdc47621-e9d3-4f68-a4fc-6e2aafa2b3ab'], outputs=['ipc:///tmp/e63da8dd-3497-465e-9338-6929ee96328c'], coordinator_input=None, coordinator_output=None, frontend_stats_publish_address=None), parallel_config={'data_parallel_master_ip': '127.0.0.1', 'data_parallel_master_port': 0, '_data_parallel_master_port_list': [], 'data_parallel_size': 1}, parallel_config_hash=None)
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [v1/engine/core.py:588] Has DP Coordinator: False, stats publish address: None
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [plugins/__init__.py:40] Available plugins for group vllm.general_plugins:
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [plugins/__init__.py:42] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [plugins/__init__.py:45] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
vllm_container  | (EngineCore_DP0 pid=74) INFO 11-06 05:01:31 [v1/engine/core.py:93] Initializing a V1 LLM engine (v0.11.1rc6.dev141+g0b8e871e5) with config: model='openai/whisper-tiny', speculative_config=None, tokenizer='openai/whisper-tiny', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=448, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-tiny, enable_prefix_caching=False, chunked_prefill_enabled=False, pooler_config=None, compilation_config={'level': None, 'mode': 0, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': None, 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'use_cudagraph': False, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'full_cuda_graph': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': 0, 'local_cache_dir': None}
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [compilation/decorators.py:184] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.deepseek_v2.DeepseekV2Model'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [compilation/decorators.py:184] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama.LlamaModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [compilation/decorators.py:184] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama_eagle3.LlamaModel'>: ['input_ids', 'positions', 'hidden_states', 'input_embeds']
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [utils/__init__.py:105] Methods get_cache_block_size_bytes not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7307f4717680>
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [distributed/parallel_state.py:1135] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.18.0.2:60557 backend=nccl
vllm_container  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [distributed/parallel_state.py:1200] Detected 1 nodes in the distributed environment
vllm_container  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container  | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container  | (EngineCore_DP0 pid=74) INFO 11-06 05:01:32 [distributed/parallel_state.py:1325] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
vllm_container  | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [v1/sample/logits_processor/__init__.py:63] No logitsprocs plugins installed (group vllm.logits_processors).
vllm_container  | (EngineCore_DP0 pid=74) INFO 11-06 05:01:37 [v1/worker/gpu_model_runner.py:2933] Starting to load model openai/whisper-tiny...
vllm_container  | (EngineCore_DP0 pid=74) INFO 11-06 05:01:38 [attention/layer.py:563] MultiHeadAttention attn_backend: _Backend.TORCH_SDPA, use_upstream_fa: False
vllm_container  | (EngineCore_DP0 pid=74) INFO 11-06 05:01:38 [platforms/rocm.py:288] Using Triton Attention backend.
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] EngineCore failed to start.
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] Traceback (most recent call last):
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in run_engine_core
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     engine_core = EngineCoreProc(*args, **kwargs)
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 602, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     super().__init__(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 102, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.model_executor = executor_class(vllm_config)
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self._init_executor()
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.driver_worker.load_model()
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 263, in load_model
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2950, in load_model
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.model = model_loader.load_model(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                  ^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     model = initialize_model(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]             ^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     return model_class(vllm_config=vllm_config, prefix=prefix)
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 892, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.model = WhisperModel(vllm_config=vllm_config, prefix=prefix)
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 597, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.decoder = WhisperDecoder(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                    ^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 559, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.start_layer, self.end_layer, self.layers = make_layers(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                                                     ^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 646, in make_layers
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 561, in <lambda>
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     lambda prefix: WhisperDecoderLayer(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                    ^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 446, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.encoder_attn = WhisperCrossAttention(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                         ^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 298, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     super().__init__(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 235, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.attn = CrossAttention(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                 ^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layers/cross_attention.py", line 162, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     super().__init__(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 299, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     self.impl = impl_cls(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]                 ^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/triton_attn.py", line 248, in __init__
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843]     raise NotImplementedError(
vllm_container  | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] NotImplementedError: Encoder self-attention and encoder/decoder cross-attention are not implemented for TritonAttentionImpl
vllm_container  | (EngineCore_DP0 pid=74) Process EngineCore_DP0:
vllm_container  | (EngineCore_DP0 pid=74) Traceback (most recent call last):
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
vllm_container  | (EngineCore_DP0 pid=74)     self.run()
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
vllm_container  | (EngineCore_DP0 pid=74)     self._target(*self._args, **self._kwargs)
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 847, in run_engine_core
vllm_container  | (EngineCore_DP0 pid=74)     raise e
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in run_engine_core
vllm_container  | (EngineCore_DP0 pid=74)     engine_core = EngineCoreProc(*args, **kwargs)
vllm_container  | (EngineCore_DP0 pid=74)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 602, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     super().__init__(
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 102, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.model_executor = executor_class(vllm_config)
vllm_container  | (EngineCore_DP0 pid=74)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self._init_executor()
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
vllm_container  | (EngineCore_DP0 pid=74)     self.driver_worker.load_model()
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 263, in load_model
vllm_container  | (EngineCore_DP0 pid=74)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2950, in load_model
vllm_container  | (EngineCore_DP0 pid=74)     self.model = model_loader.load_model(
vllm_container  | (EngineCore_DP0 pid=74)                  ^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm_container  | (EngineCore_DP0 pid=74)     model = initialize_model(
vllm_container  | (EngineCore_DP0 pid=74)             ^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm_container  | (EngineCore_DP0 pid=74)     return model_class(vllm_config=vllm_config, prefix=prefix)
vllm_container  | (EngineCore_DP0 pid=74)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 892, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.model = WhisperModel(vllm_config=vllm_config, prefix=prefix)
vllm_container  | (EngineCore_DP0 pid=74)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 597, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.decoder = WhisperDecoder(
vllm_container  | (EngineCore_DP0 pid=74)                    ^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 559, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.start_layer, self.end_layer, self.layers = make_layers(
vllm_container  | (EngineCore_DP0 pid=74)                                                     ^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 646, in make_layers
vllm_container  | (EngineCore_DP0 pid=74)     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
vllm_container  | (EngineCore_DP0 pid=74)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 561, in <lambda>
vllm_container  | (EngineCore_DP0 pid=74)     lambda prefix: WhisperDecoderLayer(
vllm_container  | (EngineCore_DP0 pid=74)                    ^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 446, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.encoder_attn = WhisperCrossAttention(
vllm_container  | (EngineCore_DP0 pid=74)                         ^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 298, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     super().__init__(
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 235, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.attn = CrossAttention(
vllm_container  | (EngineCore_DP0 pid=74)                 ^^^^^^^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layers/cross_attention.py", line 162, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     super().__init__(
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 299, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     self.impl = impl_cls(
vllm_container  | (EngineCore_DP0 pid=74)                 ^^^^^^^^^
vllm_container  | (EngineCore_DP0 pid=74)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/triton_attn.py", line 248, in __init__
vllm_container  | (EngineCore_DP0 pid=74)     raise NotImplementedError(
vllm_container  | (EngineCore_DP0 pid=74) NotImplementedError: Encoder self-attention and encoder/decoder cross-attention are not implemented for TritonAttentionImpl
vllm_container  | [rank0]:[W1106 05:01:39.094069486 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
vllm_container  | (APIServer pid=1) Traceback (most recent call last):
vllm_container  | (APIServer pid=1)   File "/usr/local/bin/vllm", line 7, in <module>
vllm_container  | (APIServer pid=1)     sys.exit(main())
vllm_container  | (APIServer pid=1)              ^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
vllm_container  | (APIServer pid=1)     args.dispatch_function(args)
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
vllm_container  | (APIServer pid=1)     uvloop.run(run_server(args))
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
vllm_container  | (APIServer pid=1)     return __asyncio.run(
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
vllm_container  | (APIServer pid=1)     return runner.run(main)
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
vllm_container  | (APIServer pid=1)     return self._loop.run_until_complete(task)
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
vllm_container  | (APIServer pid=1)     return await main
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2008, in run_server
vllm_container  | (APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2027, in run_server_worker
vllm_container  | (APIServer pid=1)     async with build_async_engine_client(
vllm_container  | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm_container  | (APIServer pid=1)     return await anext(self.gen)
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
vllm_container  | (APIServer pid=1)     async with build_async_engine_client_from_engine_args(
vllm_container  | (APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm_container  | (APIServer pid=1)     return await anext(self.gen)
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 236, in build_async_engine_client_from_engine_args
vllm_container  | (APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
vllm_container  | (APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/func_utils.py", line 116, in inner
vllm_container  | (APIServer pid=1)     return fn(*args, **kwargs)
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 202, in from_vllm_config
vllm_container  | (APIServer pid=1)     return cls(
vllm_container  | (APIServer pid=1)            ^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 132, in __init__
vllm_container  | (APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
vllm_container  | (APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
vllm_container  | (APIServer pid=1)     return AsyncMPClient(*client_args)
vllm_container  | (APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 808, in __init__
vllm_container  | (APIServer pid=1)     super().__init__(
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 469, in __init__
vllm_container  | (APIServer pid=1)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
vllm_container  | (APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container  | (APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
vllm_container  | (APIServer pid=1)     next(self.gen)
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 898, in launch_core_engines
vllm_container  | (APIServer pid=1)     wait_for_engine_startup(
vllm_container  | (APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 955, in wait_for_engine_startup
vllm_container  | (APIServer pid=1)     raise RuntimeError(
vllm_container  | (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
vllm_container exited with code 1
```
## This is my Dockerfile with vLLM 0.11.1:

```Dockerfile
FROM rocm/vllm-dev:nightly

RUN pip install --upgrade pip && \
    pip install "vllm[audio]"

CMD [ "vllm" , "serve" , "openai/whisper-tiny" , "--host" , "0.0.0.0" , "--port" , "8000"  ]

```

## This is my Dockerfile with vLLM 0.10.1:

```Dockerfile
FROM rocm/vllm-dev:rocm6.4.1_vllm_0.10.1_20250909

RUN pip install --upgrade pip && \
    pip install "vllm[audio]"

CMD [ "vllm" , "serve" , "openai/whisper-tiny" , "--host" , "0.0.0.0" , "--port" , "8000"  ]
```

## This is my docker-compose.yml (i use it with both images):

```yml
services:
  vllm:
    build: .
    container_name: vllm_container
    environment:
      - VLLM_LOGGING_LEVEL=DEBUG
    ports:
      - "8000:8000"
    volumes:
      - /workspace
    working_dir: /workspace
    tty: true
    stdin_open: true
    devices:
      - /dev/kfd
      - /dev/dri
      - /dev/mem
    group_add:
      - video
      - render
```

In version 0.10.1, Whisper works normally.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: encoder_decoder models (e.g. Whisper) is not working in vLLM 0.11 with ROCm #28184

Your current environment

🐛 Describe the bug

Description:

Then, when i was running the container, this error occurred:

This is my Dockerfile with vLLM 0.11.1:

This is my Dockerfile with vLLM 0.10.1:

This is my docker-compose.yml (i use it with both images):

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: encoder_decoder models (e.g. Whisper) is not working in vLLM 0.11 with ROCm #28184

Description

Your current environment

🐛 Describe the bug

Description:

Then, when i was running the container, this error occurred:

This is my Dockerfile with vLLM 0.11.1:

This is my Dockerfile with vLLM 0.10.1:

This is my docker-compose.yml (i use it with both images):

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions