Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5d3294e
[ROCm][CI] Installation test modifications and improvements
AndreasKaratzas Nov 17, 2025
65e6376
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 18, 2025
6364378
[ROCm][CI] fix for pytorch/pytorch standalone tests
AndreasKaratzas Nov 19, 2025
d3ff04b
Merge upstream/main into akaratza_ci
AndreasKaratzas Nov 19, 2025
16ebdd0
[ROCm][CI] Merged reviews
AndreasKaratzas Nov 19, 2025
42f4b6c
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 19, 2025
2722576
[ROCm][CI] Fixed assertion condition for prebuilt wheels
AndreasKaratzas Nov 19, 2025
a5b0106
[ROCm][CI] Merged reviews
AndreasKaratzas Nov 19, 2025
0754972
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 19, 2025
b1f171f
ROCm CI fixes: FlexAttention backend support and test adjustments
AndreasKaratzas Nov 20, 2025
1a55574
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 20, 2025
f23cf89
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 20, 2025
72e3a0f
ROCm CI fixes: LoRA related adjustments and whisper test fixes.
AndreasKaratzas Nov 21, 2025
d7776cf
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 21, 2025
0e62494
Renamed properly the attn backend fixtures
AndreasKaratzas Nov 21, 2025
2a4c027
[ROCm][CI] Changed to flex attention for cross-attention
AndreasKaratzas Nov 21, 2025
aa2a7f7
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 21, 2025
e8334d9
[ROCm][CI] Keeping AITER FA attention for whisper pending #28376
AndreasKaratzas Nov 21, 2025
eda5676
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 21, 2025
694f8f4
[ROCm][CI] Increased timeout window for video tests
AndreasKaratzas Nov 21, 2025
4913b2d
[ROCm][CI] Vision tests were not tailored for ROCm backend
AndreasKaratzas Nov 21, 2025
14c82d4
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 21, 2025
e48274a
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 21, 2025
315c44e
[ROCm][CI] Resolved
AndreasKaratzas Nov 22, 2025
dc94057
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 22, 2025
32a944b
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 22, 2025
84f899b
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 23, 2025
75f7a93
[ROCm][CI] Resolved
AndreasKaratzas Nov 23, 2025
702a498
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 23, 2025
f6b2fdb
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 24, 2025
2ce0caf
Added Triton encoder only self attention support
AndreasKaratzas Nov 24, 2025
788765c
Merge remote-tracking branch 'upstream/main' into akaratza_encoder_at…
AndreasKaratzas Nov 24, 2025
b31f035
Added FlexAttention logic
AndreasKaratzas Nov 24, 2025
a7a09cb
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 24, 2025
d93f049
Fixes and other entrypoint tests on ROCm
AndreasKaratzas Nov 24, 2025
146856c
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 24, 2025
3990acd
Synced Triton kernel with upstream and slightly modified versioning f…
AndreasKaratzas Nov 25, 2025
9ad6c9d
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 25, 2025
099d05f
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 25, 2025
c10401f
[Bugfix] Both AITER and AITER unified attention need to be set
AndreasKaratzas Nov 25, 2025
18e3c68
Merge remote-tracking branch 'upstream/main' into akaratza_ci
AndreasKaratzas Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ steps:
source_file_dependencies:
- requirements/nightly_torch_test.txt
commands:
# NOTE: We are going to skip this test on ROCm platform
# as we don't use pytorch nightly builds on ROCm. We
# only use stable PyTorch releases built with ROCm support.
- bash standalone_tests/pytorch_nightly_dependency.sh

- label: Async Engine, Inputs, Utils, Worker Test # 10min
Expand Down
15 changes: 14 additions & 1 deletion docker/Dockerfile.rocm
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/.buildkite /.buildkite
# -----------------------
# Test vLLM image
FROM base AS test
ARG PYTHON_VERSION=3.12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being set in the base docker. A better way would be to set it as an env there and inherit in this image


RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*

Expand All @@ -86,10 +87,22 @@ COPY --from=build_vllm ${COMMON_WORKDIR}/vllm /vllm-workspace

# install development dependencies (for testing)
RUN cd /vllm-workspace \
&& rm -rf vllm \
&& python3 -m pip install -e tests/vllm_test_utils \
&& python3 -m pip install pytest-shard

# enable fast downloads from hf (for testing)
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --system hf_transfer
ENV HF_HUB_ENABLE_HF_TRANSFER=1

# Copy in the v1 package for testing (it isn't distributed yet)
COPY vllm/v1 /usr/local/lib/python${PYTHON_VERSION}/dist-packages/vllm/v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks kinda hacky, why do we need it for tests, but not for the normal distribution?


# Source code is used in the `python_only_compile.sh` test
# We hide it inside `src/` so that this source code
# will not be imported by other tests
RUN mkdir src && mv vllm src/vllm

# -----------------------
# Final vLLM image
FROM base AS final
Expand Down
23 changes: 20 additions & 3 deletions docker/Dockerfile.rocm_base
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ ARG PYTORCH_BRANCH="1c57644d"
ARG PYTORCH_VISION_BRANCH="v0.23.0"
ARG PYTORCH_REPO="https://github.com/ROCm/pytorch.git"
ARG PYTORCH_VISION_REPO="https://github.com/pytorch/vision.git"
ARG PYTORCH_AUDIO_BRANCH="v2.9.0"
ARG PYTORCH_AUDIO_REPO="https://github.com/pytorch/audio.git"
ARG FA_BRANCH="0e60e394"
ARG FA_REPO="https://github.com/Dao-AILab/flash-attention.git"
ARG AITER_BRANCH="59bd8ff2"
Expand Down Expand Up @@ -45,6 +47,7 @@ RUN apt-get update -y \
&& python3 --version && python3 -m pip --version

RUN pip install -U packaging 'cmake<4' ninja wheel 'setuptools<80' pybind11 Cython
RUN apt-get update && apt-get install -y libjpeg-dev libsox-dev libsox-fmt-all sox && rm -rf /var/lib/apt/lists/*

FROM base AS build_triton
ARG TRITON_BRANCH
Expand All @@ -66,20 +69,30 @@ RUN mkdir -p /app/install && cp /opt/rocm/share/amd_smi/dist/*.whl /app/install
FROM base AS build_pytorch
ARG PYTORCH_BRANCH
ARG PYTORCH_VISION_BRANCH
ARG PYTORCH_AUDIO_BRANCH
ARG PYTORCH_REPO
ARG PYTORCH_VISION_REPO
ARG PYTORCH_AUDIO_REPO

RUN git clone ${PYTORCH_REPO} pytorch
RUN cd pytorch && git checkout ${PYTORCH_BRANCH} && \
pip install -r requirements.txt && git submodule update --init --recursive \
RUN cd pytorch && git checkout ${PYTORCH_BRANCH} \
&& pip install -r requirements.txt && git submodule update --init --recursive \
&& python3 tools/amd_build/build_amd.py \
&& CMAKE_PREFIX_PATH=$(python3 -c 'import sys; print(sys.prefix)') python3 setup.py bdist_wheel --dist-dir=dist \
&& pip install dist/*.whl
RUN git clone ${PYTORCH_VISION_REPO} vision
RUN cd vision && git checkout ${PYTORCH_VISION_BRANCH} \
&& python3 setup.py bdist_wheel --dist-dir=dist \
&& pip install dist/*.whl
RUN git clone ${PYTORCH_AUDIO_REPO} audio
RUN cd audio && git checkout ${PYTORCH_AUDIO_BRANCH} \
&& git submodule update --init --recursive \
&& pip install -r requirements.txt \
&& python3 setup.py bdist_wheel --dist-dir=dist \
&& pip install dist/*.whl
Comment on lines +88 to +93
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To reduce the number of Docker image layers and improve build efficiency, it's recommended to combine the git clone and the subsequent build commands for torchaudio into a single RUN instruction.

RUN git clone ${PYTORCH_AUDIO_REPO} audio && cd audio \
    && git checkout ${PYTORCH_AUDIO_BRANCH} \
    && git submodule update --init --recursive \
    && pip install -r requirements.txt \
    && python3 setup.py bdist_wheel --dist-dir=dist \
    && pip install dist/*.whl

RUN mkdir -p /app/install && cp /app/pytorch/dist/*.whl /app/install \
&& cp /app/vision/dist/*.whl /app/install
&& cp /app/vision/dist/*.whl /app/install \
&& cp /app/audio/dist/*.whl /app/install

FROM base AS build_fa
ARG FA_BRANCH
Expand Down Expand Up @@ -130,6 +143,8 @@ ARG PYTORCH_BRANCH
ARG PYTORCH_VISION_BRANCH
ARG PYTORCH_REPO
ARG PYTORCH_VISION_REPO
ARG PYTORCH_AUDIO_BRANCH
ARG PYTORCH_AUDIO_REPO
ARG FA_BRANCH
ARG FA_REPO
ARG AITER_BRANCH
Expand All @@ -141,6 +156,8 @@ RUN echo "BASE_IMAGE: ${BASE_IMAGE}" > /app/versions.txt \
&& echo "PYTORCH_VISION_BRANCH: ${PYTORCH_VISION_BRANCH}" >> /app/versions.txt \
&& echo "PYTORCH_REPO: ${PYTORCH_REPO}" >> /app/versions.txt \
&& echo "PYTORCH_VISION_REPO: ${PYTORCH_VISION_REPO}" >> /app/versions.txt \
&& echo "PYTORCH_AUDIO_BRANCH: ${PYTORCH_AUDIO_BRANCH}" >> /app/versions.txt \
&& echo "PYTORCH_AUDIO_REPO: ${PYTORCH_AUDIO_REPO}" >> /app/versions.txt \
&& echo "FA_BRANCH: ${FA_BRANCH}" >> /app/versions.txt \
&& echo "FA_REPO: ${FA_REPO}" >> /app/versions.txt \
&& echo "AITER_BRANCH: ${AITER_BRANCH}" >> /app/versions.txt \
Expand Down
6 changes: 5 additions & 1 deletion requirements/rocm-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ bm25s==0.2.13
pystemmer==3.0.0

# Entrypoints test
# librosa==0.10.2.post1 # required by audio tests in entrypoints/openai
audioread==3.0.1
cffi==1.17.1
decorator==5.2.1
Expand All @@ -16,6 +15,8 @@ pooch==1.8.2
soundfile==0.13.1
soxr==0.5.0.post1
librosa==0.10.2.post1
num2words==0.5.14
pqdm==0.2.0

# Entrypoints test
#vllm[video] # required by entrypoints/openai/test_video.py
Expand All @@ -28,6 +29,9 @@ sentence-transformers==3.4.1
# Basic Models Test
matplotlib==3.10.3

# Datasets and Evaluate Test
multiprocess==0.70.16

# Multi-Modal Models Test (Extended) 3
blobfile==3.0.0

Expand Down
68 changes: 40 additions & 28 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,15 +49,15 @@ def load_module_from_path(module_name, path):
sys.platform,
)
VLLM_TARGET_DEVICE = "empty"
elif (
sys.platform.startswith("linux")
and torch.version.cuda is None
and os.getenv("VLLM_TARGET_DEVICE") is None
and torch.version.hip is None
):
# if cuda or hip is not available and VLLM_TARGET_DEVICE is not set,
# fallback to cpu
VLLM_TARGET_DEVICE = "cpu"
elif sys.platform.startswith("linux") and os.getenv("VLLM_TARGET_DEVICE") is None:
if torch.version.hip is not None:
VLLM_TARGET_DEVICE = "rocm"
logger.info("Auto-detected ROCm")
elif torch.version.cuda is not None:
VLLM_TARGET_DEVICE = "cuda"
logger.info("Auto-detected CUDA")
else:
VLLM_TARGET_DEVICE = "cpu"


def is_sccache_available() -> bool:
Expand Down Expand Up @@ -115,20 +115,26 @@ def compute_num_jobs(self):
num_jobs = os.cpu_count()

nvcc_threads = None
if _is_cuda() and get_nvcc_cuda_version() >= Version("11.2"):
# `nvcc_threads` is either the value of the NVCC_THREADS
# environment variable (if defined) or 1.
# when it is set, we reduce `num_jobs` to avoid
# overloading the system.
nvcc_threads = envs.NVCC_THREADS
if nvcc_threads is not None:
nvcc_threads = int(nvcc_threads)
logger.info(
"Using NVCC_THREADS=%d as the number of nvcc threads.", nvcc_threads
)
else:
nvcc_threads = 1
num_jobs = max(1, num_jobs // nvcc_threads)
if _is_cuda() and CUDA_HOME is not None:
try:
nvcc_version = get_nvcc_cuda_version()
if nvcc_version >= Version("11.2"):
# `nvcc_threads` is either the value of the NVCC_THREADS
# environment variable (if defined) or 1.
# when it is set, we reduce `num_jobs` to avoid
# overloading the system.
nvcc_threads = envs.NVCC_THREADS
if nvcc_threads is not None:
nvcc_threads = int(nvcc_threads)
logger.info(
"Using NVCC_THREADS=%d as the number of nvcc threads.",
nvcc_threads,
)
else:
nvcc_threads = 1
num_jobs = max(1, num_jobs // nvcc_threads)
except Exception as e:
logger.warning("Failed to get NVCC version: %s", e)

return num_jobs, nvcc_threads

Expand Down Expand Up @@ -206,9 +212,9 @@ def configure(self, ext: CMakeExtension) -> None:
# Default build tool to whatever cmake picks.
build_tool = []
# Make sure we use the nvcc from CUDA_HOME
if _is_cuda():
if _is_cuda() and CUDA_HOME is not None:
cmake_args += [f"-DCMAKE_CUDA_COMPILER={CUDA_HOME}/bin/nvcc"]
elif _is_hip():
elif _is_hip() and ROCM_HOME is not None:
cmake_args += [f"-DROCM_PATH={ROCM_HOME}"]

other_cmake_args = os.environ.get("CMAKE_ARGS")
Expand Down Expand Up @@ -318,7 +324,9 @@ class precompiled_build_ext(build_ext):
"""Disables extension building when using precompiled binaries."""

def run(self) -> None:
assert _is_cuda(), "VLLM_USE_PRECOMPILED is only supported for CUDA builds"
assert _is_cuda() or _is_hip(), (
"VLLM_USE_PRECOMPILED is only supported for CUDA or ROCm builds."
)

def build_extensions(self) -> None:
print("Skipping build_ext: using precompiled extensions.")
Expand Down Expand Up @@ -490,6 +498,8 @@ def get_rocm_version():
# Get the Rocm version from the ROCM_HOME/bin/librocm-core.so
# see https://github.com/ROCm/rocm-core/blob/d11f5c20d500f729c393680a01fa902ebf92094b/rocm_version.cpp#L21
try:
if ROCM_HOME is None:
return None
librocm_core_file = Path(ROCM_HOME) / "lib" / "librocm-core.so"
if not librocm_core_file.is_file():
return None
Expand Down Expand Up @@ -656,7 +666,9 @@ def _read_requirements(filename: str) -> list[str]:

if _is_cuda():
ext_modules.append(CMakeExtension(name="vllm.vllm_flash_attn._vllm_fa2_C"))
if envs.VLLM_USE_PRECOMPILED or get_nvcc_cuda_version() >= Version("12.3"):
if envs.VLLM_USE_PRECOMPILED or (
CUDA_HOME and get_nvcc_cuda_version() >= Version("12.3")
):
# FA3 requires CUDA 12.3 or later
ext_modules.append(CMakeExtension(name="vllm.vllm_flash_attn._vllm_fa3_C"))
# Optional since this doesn't get built (produce an .so file) when
Expand All @@ -679,7 +691,7 @@ def _read_requirements(filename: str) -> list[str]:

# If using precompiled, extract and patch package_data (in advance of setup)
if envs.VLLM_USE_PRECOMPILED:
assert _is_cuda(), "VLLM_USE_PRECOMPILED is only supported for CUDA builds"
assert _is_cuda(), "VLLM_USE_PRECOMPILED is only supported for CUDA builds."
wheel_location = os.getenv("VLLM_PRECOMPILED_WHEEL_LOCATION", None)
if wheel_location is not None:
wheel_url = wheel_location
Expand Down
7 changes: 6 additions & 1 deletion tests/standalone_tests/pytorch_nightly_dependency.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
set -e
set -x

if command -v rocminfo >/dev/null 2>&1; then
echo "Skipping test for ROCm platform"
exit 0
fi

cd /vllm-workspace/

rm -rf .venv
Expand Down Expand Up @@ -36,7 +41,7 @@ if diff before.txt after.txt; then
echo "torch version not overridden."
else
echo "torch version overridden by nightly_torch_test.txt, \
if the dependency is not triggered by the pytroch nightly test,\
if the dependency is not triggered by the pytorch nightly test,\
please add the dependency to the list 'white_list' in tools/pre_commit/generate_nightly_torch_test.py"
exit 1
fi