-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Open
Labels
Description
Your current environment
The output of python collect_env.py
==============================
System Info
==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version : 20.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.0.0 25314 f4087f6b428f0e6f575ebac8a8a724dab123d06e)
CMake version : version 3.31.6
Libc version : glibc-2.35
==============================
PyTorch Info
==============================
PyTorch version : 2.9.0a0+git1c57644
Is debug build : False
CUDA used to build PyTorch : N/A
ROCM used to build PyTorch : 7.0.51831-a3e329ad8
==============================
Python Environment
==============================
Python version : 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0] (64-bit runtime)
Python platform : Linux-6.14.0-33-generic-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : Could not collect
CUDA_MODULE_LOADING set to :
GPU models and configuration : (gfx1101)
Nvidia driver version : Could not collect
cuDNN version : Could not collect
HIP runtime version : 7.0.51831
MIOpen runtime version : 3.5.0
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: AuthenticAMD
BIOS Vendor ID: Advanced Micro Devices, Inc.
Model name: AMD Ryzen 5 5600 6-Core Processor
BIOS Model name: AMD Ryzen 5 5600 6-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU max MHz: 4470.0000
CPU min MHz: 550.0000
BogoMIPS: 7000.46
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization: AMD-V
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 3 MiB (6 instances)
L3 cache: 32 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
==============================
[pip3] conch-triton-kernels==1.2.1
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0a0+git1c57644
[pip3] torchvision==0.23.0a0+824e8c8
[pip3] transformers==4.57.1
[pip3] triton==3.4.0
[pip3] triton_kernels==1.0.0
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : 7.0.51831-a3e329ad8
vLLM Version : 0.11.1rc6.dev141+g0b8e871e5 (git sha: 0b8e871e5)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
GPU0
GPU0 0
================================= Hops between two GPUs ==================================
GPU0
GPU0 0
=============================== Link Type between two GPUs ===============================
GPU0
GPU0 0
======================================= Numa Nodes =======================================
GPU[0] : (Topology) Numa Node: 0
GPU[0] : (Topology) Numa Affinity: -1
================================== End of ROCm SMI Log ===================================
==============================
Environment Variables
==============================
PYTORCH_TUNABLEOP_TUNING=0
PYTORCH_TUNABLEOP_ENABLED=1
PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100;gfx1101;gfx1200;gfx1201;gfx1150;gfx1151
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
VLLM_LOGGING_LEVEL=DEBUG
PYTORCH_TUNABLEOP_FILENAME=/app/afo_tune_device_%d_full.csv
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
The output of python collect_env.py
==============================
System Info
==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version : 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.1 25184 c87081df219c42dc27c5b6d86c0525bc7d01f727)
CMake version : version 3.31.6
Libc version : glibc-2.35
==============================
PyTorch Info
==============================
PyTorch version : 2.7.0+gitf717b2a
Is debug build : False
CUDA used to build PyTorch : N/A
ROCM used to build PyTorch : 6.4.43483-a187df25c
==============================
Python Environment
==============================
Python version : 3.12.11 (main, Jun 4 2025, 08:56:18) [GCC 11.4.0] (64-bit runtime)
Python platform : Linux-6.14.0-33-generic-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : Could not collect
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration : AMD Radeon RX 7700 XT (gfx1101)
Nvidia driver version : Could not collect
cuDNN version : Could not collect
HIP runtime version : 6.4.43483
MIOpen runtime version : 3.4.0
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: AuthenticAMD
BIOS Vendor ID: Advanced Micro Devices, Inc.
Model name: AMD Ryzen 5 5600 6-Core Processor
BIOS Model name: AMD Ryzen 5 5600 6-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU max MHz: 4470.0000
CPU min MHz: 550.0000
BogoMIPS: 7000.46
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization: AMD-V
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 3 MiB (6 instances)
L3 cache: 32 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
==============================
[pip3] conch-triton-kernels==1.2.1
[pip3] numpy==2.2.6
[pip3] pyzmq==27.0.2
[pip3] torch==2.7.0+gitf717b2a
[pip3] torchvision==0.21.0+7af6987
[pip3] transformers==4.56.1
[pip3] triton==3.2.0+gite5be006a
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : 6.4.43483-a187df25c
vLLM Version : 0.10.1rc2.dev410+g6663000a3 (git sha: 6663000a3)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
GPU0
GPU0 0
================================= Hops between two GPUs ==================================
GPU0
GPU0 0
=============================== Link Type between two GPUs ===============================
GPU0
GPU0 0
======================================= Numa Nodes =======================================
GPU[0] : (Topology) Numa Node: 0
GPU[0] : (Topology) Numa Affinity: -1
================================== End of ROCm SMI Log ===================================
==============================
Environment Variables
==============================
PYTORCH_TUNABLEOP_TUNING=0
PYTORCH_TUNABLEOP_ENABLED=1
PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx1100;gfx1101;gfx1200;gfx1201
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
VLLM_LOGGING_LEVEL=DEBUG
PYTORCH_TUNABLEOP_FILENAME=/app/afo_tune_device_%d_full.csv
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
Description:
I downloaded the vllm/rocm-dev image version 0.11.1_20251105 with ROCm 7 to test if Whisper could perform timestamp transcription with segments in this version, since this functionality was not available in the version I was using before (0.10.1).
Then, when i was running the container, this error occurred:
Attaching to vllm_container
vllm_container | DEBUG 11-06 05:01:05 [plugins/__init__.py:32] No plugins for group vllm.platform_plugins found.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:36] Checking if TPU platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:55] TPU platform is not available because: No module named 'libtpu'
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:61] Checking if CUDA platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:88] Exception happens when checking CUDA platform: NVML Shared Library Not Found
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:105] CUDA platform is not available because: NVML Shared Library Not Found
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:133] Checking if XPU platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:153] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:160] Checking if CPU platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:05 [platforms/__init__.py:225] Automatically detected platform rocm.
vllm_container | DEBUG 11-06 05:01:08 [utils/flashinfer.py:44] FlashInfer unavailable since package was not found
vllm_container | DEBUG 11-06 05:01:10 [entrypoints/utils.py:175] Setting VLLM_WORKER_MULTIPROC_METHOD to 'spawn'
vllm_container | DEBUG 11-06 05:01:10 [plugins/__init__.py:40] Available plugins for group vllm.general_plugins:
vllm_container | DEBUG 11-06 05:01:10 [plugins/__init__.py:42] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
vllm_container | DEBUG 11-06 05:01:10 [plugins/__init__.py:45] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
vllm_container | DEBUG 11-06 05:01:10 [config/parallel.py:595] Disabled the custom all-reduce kernel because it is not supported on current platform.
vllm_container | (APIServer pid=1) INFO 11-06 05:01:10 [entrypoints/openai/api_server.py:1961] vLLM API server version 0.11.1rc6.dev141+g0b8e871e5
vllm_container | (APIServer pid=1) INFO 11-06 05:01:10 [entrypoints/utils.py:253] non-default args: {'model_tag': 'openai/whisper-tiny', 'host': '0.0.0.0', 'model': 'openai/whisper-tiny'}
config.json: 1.98kB [00:00, 16.5MB/s]
preprocessor_config.json: 185kB [00:00, 50.9MB/s]
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:12 [model_executor/models/registry.py:598] Cached model info file for class vllm.model_executor.models.whisper.WhisperForConditionalGeneration not found
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:12 [model_executor/models/registry.py:658] Cache model info for class vllm.model_executor.models.whisper.WhisperForConditionalGeneration miss. Loading model instead.
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:17 [model_executor/models/registry.py:668] Loaded model info for class vllm.model_executor.models.whisper.WhisperForConditionalGeneration
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:17 [logging_utils/log_time.py:29] Registry inspect model class: Elapsed time 5.1163223 secs
vllm_container | (APIServer pid=1) INFO 11-06 05:01:17 [config/model.py:630] Resolved architecture: WhisperForConditionalGeneration
vllm_container | (APIServer pid=1) INFO 11-06 05:01:17 [config/model.py:1951] Downcasting torch.float32 to torch.bfloat16.
vllm_container | (APIServer pid=1) INFO 11-06 05:01:17 [config/model.py:1728] Using max model len 448
vllm_container | (APIServer pid=1) WARNING 11-06 05:01:17 [config/model.py:1067] CUDA graph is not supported for whisper on ROCm yet, fallback to eager mode.
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:17 [engine/arg_utils.py:1933] Setting max_num_batched_tokens to 2048 for OPENAI_API_SERVER usage context.
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:17 [engine/arg_utils.py:1945] Setting max_num_seqs to 256 for OPENAI_API_SERVER usage context.
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:17 [config/parallel.py:595] Disabled the custom all-reduce kernel because it is not supported on current platform.
vllm_container | (APIServer pid=1) INFO 11-06 05:01:17 [config/scheduler.py:186] Encoder-decoder models do not support chunked prefill nor prefix caching; disabling both.
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:17 [config/parallel.py:595] Disabled the custom all-reduce kernel because it is not supported on current platform.
vllm_container | (APIServer pid=1) INFO 11-06 05:01:17 [config/vllm.py:453] Cudagraph is disabled under eager mode
tokenizer_config.json: 283kB [00:00, 74.3MB/s]
vocab.json: 836kB [00:00, 8.56MB/s]
tokenizer.json: 2.48MB [00:00, 22.0MB/s]
merges.txt: 494kB [00:00, 16.6MB/s]
normalizer.json: 52.7kB [00:00, 26.1MB/s]
added_tokens.json: 34.6kB [00:00, 22.4MB/s]
special_tokens_map.json: 2.19kB [00:00, 4.21MB/s]
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:25 [config/vllm.py:503] Encoder-decoder model detected: setting `max_num_encoder_input_tokens` to encoder length (1500)
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:25 [plugins/__init__.py:32] No plugins for group vllm.stat_logger_plugins found.
generation_config.json: 3.75kB [00:00, 3.97MB/s]
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:25 [plugins/io_processors/__init__.py:33] No IOProcessor plugins requested by the model
vllm_container | DEBUG 11-06 05:01:27 [plugins/__init__.py:32] No plugins for group vllm.platform_plugins found.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:36] Checking if TPU platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:55] TPU platform is not available because: No module named 'libtpu'
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:61] Checking if CUDA platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:88] Exception happens when checking CUDA platform: NVML Shared Library Not Found
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:105] CUDA platform is not available because: NVML Shared Library Not Found
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:133] Checking if XPU platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:153] XPU platform is not available because: No module named 'intel_extension_for_pytorch'
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:160] Checking if CPU platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:112] Checking if ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:120] Confirmed ROCm platform is available.
vllm_container | DEBUG 11-06 05:01:27 [platforms/__init__.py:225] Automatically detected platform rocm.
vllm_container | DEBUG 11-06 05:01:29 [utils/flashinfer.py:44] FlashInfer unavailable since package was not found
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [v1/engine/core.py:780] Waiting for init message from front-end.
vllm_container | (APIServer pid=1) DEBUG 11-06 05:01:31 [v1/engine/utils.py:1058] HELLO from local core engine process 0.
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [v1/engine/core.py:791] Received init message: EngineHandshakeMetadata(addresses=EngineZmqAddresses(inputs=['ipc:///tmp/bdc47621-e9d3-4f68-a4fc-6e2aafa2b3ab'], outputs=['ipc:///tmp/e63da8dd-3497-465e-9338-6929ee96328c'], coordinator_input=None, coordinator_output=None, frontend_stats_publish_address=None), parallel_config={'data_parallel_master_ip': '127.0.0.1', 'data_parallel_master_port': 0, '_data_parallel_master_port_list': [], 'data_parallel_size': 1}, parallel_config_hash=None)
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [v1/engine/core.py:588] Has DP Coordinator: False, stats publish address: None
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [plugins/__init__.py:40] Available plugins for group vllm.general_plugins:
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [plugins/__init__.py:42] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [plugins/__init__.py:45] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
vllm_container | (EngineCore_DP0 pid=74) INFO 11-06 05:01:31 [v1/engine/core.py:93] Initializing a V1 LLM engine (v0.11.1rc6.dev141+g0b8e871e5) with config: model='openai/whisper-tiny', speculative_config=None, tokenizer='openai/whisper-tiny', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=448, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-tiny, enable_prefix_caching=False, chunked_prefill_enabled=False, pooler_config=None, compilation_config={'level': None, 'mode': 0, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': None, 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'use_cudagraph': False, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'full_cuda_graph': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': 0, 'local_cache_dir': None}
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [compilation/decorators.py:184] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.deepseek_v2.DeepseekV2Model'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [compilation/decorators.py:184] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama.LlamaModel'>: ['input_ids', 'positions', 'intermediate_tensors', 'inputs_embeds']
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:31 [compilation/decorators.py:184] Inferred dynamic dimensions for forward method of <class 'vllm.model_executor.models.llama_eagle3.LlamaModel'>: ['input_ids', 'positions', 'hidden_states', 'input_embeds']
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [utils/__init__.py:105] Methods get_cache_block_size_bytes not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7307f4717680>
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [distributed/parallel_state.py:1135] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.18.0.2:60557 backend=nccl
vllm_container | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [distributed/parallel_state.py:1200] Detected 1 nodes in the distributed environment
vllm_container | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container | [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
vllm_container | (EngineCore_DP0 pid=74) INFO 11-06 05:01:32 [distributed/parallel_state.py:1325] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
vllm_container | (EngineCore_DP0 pid=74) DEBUG 11-06 05:01:32 [v1/sample/logits_processor/__init__.py:63] No logitsprocs plugins installed (group vllm.logits_processors).
vllm_container | (EngineCore_DP0 pid=74) INFO 11-06 05:01:37 [v1/worker/gpu_model_runner.py:2933] Starting to load model openai/whisper-tiny...
vllm_container | (EngineCore_DP0 pid=74) INFO 11-06 05:01:38 [attention/layer.py:563] MultiHeadAttention attn_backend: _Backend.TORCH_SDPA, use_upstream_fa: False
vllm_container | (EngineCore_DP0 pid=74) INFO 11-06 05:01:38 [platforms/rocm.py:288] Using Triton Attention backend.
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] EngineCore failed to start.
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] Traceback (most recent call last):
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in run_engine_core
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] engine_core = EngineCoreProc(*args, **kwargs)
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 602, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] super().__init__(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 102, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.model_executor = executor_class(vllm_config)
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self._init_executor()
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.driver_worker.load_model()
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 263, in load_model
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2950, in load_model
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.model = model_loader.load_model(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] model = initialize_model(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] return model_class(vllm_config=vllm_config, prefix=prefix)
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 892, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.model = WhisperModel(vllm_config=vllm_config, prefix=prefix)
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 597, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.decoder = WhisperDecoder(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 559, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.start_layer, self.end_layer, self.layers = make_layers(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 646, in make_layers
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 561, in <lambda>
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] lambda prefix: WhisperDecoderLayer(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 446, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.encoder_attn = WhisperCrossAttention(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 298, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] super().__init__(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 235, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.attn = CrossAttention(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layers/cross_attention.py", line 162, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] super().__init__(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 299, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] self.impl = impl_cls(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] ^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/triton_attn.py", line 248, in __init__
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] raise NotImplementedError(
vllm_container | (EngineCore_DP0 pid=74) ERROR 11-06 05:01:38 [v1/engine/core.py:843] NotImplementedError: Encoder self-attention and encoder/decoder cross-attention are not implemented for TritonAttentionImpl
vllm_container | (EngineCore_DP0 pid=74) Process EngineCore_DP0:
vllm_container | (EngineCore_DP0 pid=74) Traceback (most recent call last):
vllm_container | (EngineCore_DP0 pid=74) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
vllm_container | (EngineCore_DP0 pid=74) self.run()
vllm_container | (EngineCore_DP0 pid=74) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
vllm_container | (EngineCore_DP0 pid=74) self._target(*self._args, **self._kwargs)
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 847, in run_engine_core
vllm_container | (EngineCore_DP0 pid=74) raise e
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in run_engine_core
vllm_container | (EngineCore_DP0 pid=74) engine_core = EngineCoreProc(*args, **kwargs)
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 602, in __init__
vllm_container | (EngineCore_DP0 pid=74) super().__init__(
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 102, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.model_executor = executor_class(vllm_config)
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 101, in __init__
vllm_container | (EngineCore_DP0 pid=74) self._init_executor()
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 48, in _init_executor
vllm_container | (EngineCore_DP0 pid=74) self.driver_worker.load_model()
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 263, in load_model
vllm_container | (EngineCore_DP0 pid=74) self.model_runner.load_model(eep_scale_up=eep_scale_up)
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2950, in load_model
vllm_container | (EngineCore_DP0 pid=74) self.model = model_loader.load_model(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
vllm_container | (EngineCore_DP0 pid=74) model = initialize_model(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 55, in initialize_model
vllm_container | (EngineCore_DP0 pid=74) return model_class(vllm_config=vllm_config, prefix=prefix)
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 892, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.model = WhisperModel(vllm_config=vllm_config, prefix=prefix)
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 597, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.decoder = WhisperDecoder(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 559, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.start_layer, self.end_layer, self.layers = make_layers(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 646, in make_layers
vllm_container | (EngineCore_DP0 pid=74) maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 561, in <lambda>
vllm_container | (EngineCore_DP0 pid=74) lambda prefix: WhisperDecoderLayer(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 446, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.encoder_attn = WhisperCrossAttention(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 298, in __init__
vllm_container | (EngineCore_DP0 pid=74) super().__init__(
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 235, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.attn = CrossAttention(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layers/cross_attention.py", line 162, in __init__
vllm_container | (EngineCore_DP0 pid=74) super().__init__(
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 299, in __init__
vllm_container | (EngineCore_DP0 pid=74) self.impl = impl_cls(
vllm_container | (EngineCore_DP0 pid=74) ^^^^^^^^^
vllm_container | (EngineCore_DP0 pid=74) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/attention/backends/triton_attn.py", line 248, in __init__
vllm_container | (EngineCore_DP0 pid=74) raise NotImplementedError(
vllm_container | (EngineCore_DP0 pid=74) NotImplementedError: Encoder self-attention and encoder/decoder cross-attention are not implemented for TritonAttentionImpl
vllm_container | [rank0]:[W1106 05:01:39.094069486 ProcessGroupNCCL.cpp:1522] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
vllm_container | (APIServer pid=1) Traceback (most recent call last):
vllm_container | (APIServer pid=1) File "/usr/local/bin/vllm", line 7, in <module>
vllm_container | (APIServer pid=1) sys.exit(main())
vllm_container | (APIServer pid=1) ^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
vllm_container | (APIServer pid=1) args.dispatch_function(args)
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 59, in cmd
vllm_container | (APIServer pid=1) uvloop.run(run_server(args))
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
vllm_container | (APIServer pid=1) return __asyncio.run(
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
vllm_container | (APIServer pid=1) return runner.run(main)
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
vllm_container | (APIServer pid=1) return self._loop.run_until_complete(task)
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
vllm_container | (APIServer pid=1) return await main
vllm_container | (APIServer pid=1) ^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2008, in run_server
vllm_container | (APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2027, in run_server_worker
vllm_container | (APIServer pid=1) async with build_async_engine_client(
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm_container | (APIServer pid=1) return await anext(self.gen)
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
vllm_container | (APIServer pid=1) async with build_async_engine_client_from_engine_args(
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
vllm_container | (APIServer pid=1) return await anext(self.gen)
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 236, in build_async_engine_client_from_engine_args
vllm_container | (APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/func_utils.py", line 116, in inner
vllm_container | (APIServer pid=1) return fn(*args, **kwargs)
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 202, in from_vllm_config
vllm_container | (APIServer pid=1) return cls(
vllm_container | (APIServer pid=1) ^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 132, in __init__
vllm_container | (APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
vllm_container | (APIServer pid=1) return AsyncMPClient(*client_args)
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 808, in __init__
vllm_container | (APIServer pid=1) super().__init__(
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 469, in __init__
vllm_container | (APIServer pid=1) with launch_core_engines(vllm_config, executor_class, log_stats) as (
vllm_container | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm_container | (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
vllm_container | (APIServer pid=1) next(self.gen)
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 898, in launch_core_engines
vllm_container | (APIServer pid=1) wait_for_engine_startup(
vllm_container | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 955, in wait_for_engine_startup
vllm_container | (APIServer pid=1) raise RuntimeError(
vllm_container | (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
vllm_container exited with code 1
This is my Dockerfile with vLLM 0.11.1:
FROM rocm/vllm-dev:nightly
RUN pip install --upgrade pip && \
pip install "vllm[audio]"
CMD [ "vllm" , "serve" , "openai/whisper-tiny" , "--host" , "0.0.0.0" , "--port" , "8000" ]
This is my Dockerfile with vLLM 0.10.1:
FROM rocm/vllm-dev:rocm6.4.1_vllm_0.10.1_20250909
RUN pip install --upgrade pip && \
pip install "vllm[audio]"
CMD [ "vllm" , "serve" , "openai/whisper-tiny" , "--host" , "0.0.0.0" , "--port" , "8000" ]This is my docker-compose.yml (i use it with both images):
services:
vllm:
build: .
container_name: vllm_container
environment:
- VLLM_LOGGING_LEVEL=DEBUG
ports:
- "8000:8000"
volumes:
- /workspace
working_dir: /workspace
tty: true
stdin_open: true
devices:
- /dev/kfd
- /dev/dri
- /dev/mem
group_add:
- video
- renderIn version 0.10.1, Whisper works normally.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.