Skip to content

example_lapack_getrs_usm throws SYCL exception on NVIDIA GPU when built with support for AMD GPU backends #708

@sidarth-narayanan

Description

@sidarth-narayanan

Summary

example_lapack_getrs_usm throws SYCL exception on NVIDIA GPU when built with support for all 3 GPU vendor backends.

This works fine if oneMath is only built for CUDA backend (or) CUDA + INTEL backend.

Version

v0.8

Environment

  • HW Used: NVIDIA 1660 Ti
  • Backend Version: CUDA toolkit version 12.4 / Driver Version: 550.90.07
  • OS : Rocky Linux 9.4
  • Compiler version: ICPX 2025.1.1
  • CMake output log:
-- CMAKE_BUILD_TYPE: None, set to Release by default
-- C compiler: icx was found in PATH, using icx
-- The CXX compiler identification is IntelLLVM 2025.1.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/people/shared/tools/intel/oneapi/compiler/2025.1/bin/icpx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- TARGET_DOMAINS: blas;lapack;rng;dft;sparse_blas
-- MKL_VERSION: 2025.1.0
-- MKL_ROOT: /usr/people/shared/tools/intel/oneapi/mkl/2025.1
-- MKL_ARCH: intel64
-- MKL_SYCL_LINK: dynamic
-- MKL_LINK: dynamic
-- MKL_SYCL_INTERFACE_FULL: intel_ilp64
-- MKL_INTERFACE_FULL: intel_ilp64
-- MKL_SYCL_THREADING: tbb_thread
-- MKL_THREADING: tbb_thread
-- MKL_MPI: None, set to ` intelmpi` by default
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_scalapack_ilp64.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_cdft_core.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_intel_ilp64.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_tbb_thread.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_core.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_blacs_intelmpi_ilp64.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_blas.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_lapack.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_dft.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_sparse.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_data_fitting.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_rng.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_stats.so
-- Found /usr/people/shared/tools/intel/oneapi/mkl/2025.1/lib/libmkl_sycl_vm.so
-- Looking for dpc++
-- Performing Test is_dpcpp
-- Performing Test is_dpcpp - Success
-- Performing Test dpcpp_supports_nvptx64
-- Performing Test dpcpp_supports_nvptx64 - Success
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDA: /usr/people/shared/tools/rocky/8/cuda/cuda-12.6 (found suitable version "12.6", minimum required is "10.0")
-- Found cuBLAS: /usr/people/shared/tools/rocky/8/cuda/cuda-12.6/include
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
src/blas/backends/rocblas/CMakeLists.txt:24 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
/usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/rocblas/rocblas-config.cmake:90 (find_dependency)
src/blas/backends/rocblas/CMakeLists.txt:25 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
-- Found cuSOLVER: /usr/people/shared/tools/rocky/8/cuda/cuda-12.6/include
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
src/lapack/backends/rocsolver/CMakeLists.txt:24 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
/usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/rocsolver/rocsolver-config.cmake:90 (find_dependency)
src/lapack/backends/rocsolver/CMakeLists.txt:25 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
-- Found cuRAND: /usr/people/shared/tools/rocky/8/cuda/cuda-12.6/include
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
src/rng/backends/rocrand/CMakeLists.txt:57 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
/usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/rocrand/rocrand-config.cmake:90 (find_dependency)
src/rng/backends/rocrand/CMakeLists.txt:58 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
-- Performing Test IS_SUPPORTED
-- Performing Test IS_SUPPORTED - Success
-- Found CUDAToolkit: /usr/people/shared/tools/rocky/8/cuda/cuda-12.6/include (found version "12.6.85")
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
src/dft/backends/rocfft/CMakeLists.txt:45 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
/usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/rocfft/rocfft-config.cmake:90 (find_dependency)
src/dft/backends/rocfft/CMakeLists.txt:47 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
-- Found CUDAToolkit: /usr/people/shared/tools/rocky/8/cuda/cuda-12.6/include (found suitable version "12.6.85", minimum required is "12.2")
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
src/sparse_blas/backends/rocsparse/CMakeLists.txt:46 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
CMake Warning (dev) at /usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config-amd.cmake:86 (message):
amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/hip/hip-config.cmake:149 (include)
/usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/usr/people/shared/tools/rocky/9/rocm/6.2.0/lib/cmake/rocsparse/rocsparse-config.cmake:90 (find_dependency)
src/sparse_blas/backends/rocsparse/CMakeLists.txt:47 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is
-- ONEAPI_DEVICE_SELECTOR will be set to the following value(s): [opencl:cpu;level_zero:gpu;cuda:gpu;hip:gpu] for run-time dispatching examples
-- ONEAPI_DEVICE_SELECTOR will be set to the following value(s): [opencl:cpu;level_zero:gpu;cuda:gpu;hip:gpu] for run-time dispatching examples
-- ONEAPI_DEVICE_SELECTOR will be set to the following value(s): [opencl:cpu;level_zero:gpu;cuda:gpu;hip:gpu] for run-time dispatching examples
-- ONEAPI_DEVICE_SELECTOR will be set to the following value(s): [opencl:cpu;level_zero:gpu;cuda:gpu;hip:gpu] for run-time dispatching examples
-- ONEAPI_DEVICE_SELECTOR will be set to the following value(s): [level_zero:gpu;opencl:cpu;cuda:gpu;hip:gpu] for run-time dispatching examples
-- ONEAPI_DEVICE_SELECTOR will be set to the following value(s): [opencl:cpu;level_zero:gpu;cuda:gpu;hip:gpu] for run-time dispatching examples
-- Configuring done (4.0s)
-- Generating done (1.3s)
-- Build files have been written to: /work/snarayanan/tasks/10641_sundails/oneMath_v0.8/build2

Steps to reproduce

CMake command:

cmake /mounts/work/snarayanan/tasks/10641_sundails/oneMath_v0.8/oneMath-0.8 -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=$CXX_COMPILE \
-DENABLE_MKLCPU_BACKEND=True -DENABLE_MKLGPU_BACKEND=True \
-DENABLE_CUBLAS_BACKEND=True -DENABLE_CUSOLVER_BACKEND=True -DENABLE_CUFFT_BACKEND=True -DENABLE_CURAND_BACKEND=True -DENABLE_CUSPARSE_BACKEND=True \
-DENABLE_ROCBLAS_BACKEND=True -DENABLE_ROCFFT_BACKEND=True -DENABLE_ROCSOLVER_BACKEND=True -DENABLE_ROCRAND_BACKEND=True -DENABLE_ROCSPARSE_BACKEND=True -DHIP_TARGETS=gfx1100 \
-DBUILD_FUNCTIONAL_TESTS=False -DBUILD_EXAMPLES=True

Build Command:

cmake --build . -j

Observed behavior

snarayanan@crane:bin$ ./example_lapack_getrs_usm

########################################################################
# LU Factorization and Solve Example:
#
# Computes LU Factorization A = P * L * U
# and uses it to solve for X in a system of linear equations:
#   AX = B
# where A is a general dense matrix and B is a matrix whose columns
# are the right-hand sides for the systems of equations.
#
# Using apis:
#   getrf and getrs
#
# Using single precision (float) data type
#
# Device will be selected during runtime.
# The environment variable ONEAPI_DEVICE_SELECTOR can be used to specify
# available devices
#
########################################################################

Running LAPACK getrs example on GPU device.
Device name is: NVIDIA GeForce GTX 1660 Ti
Running with single precision real data type:
Caught synchronous SYCL exception:
No kernel named _ZTSZZN6oneapi4math6lapack8cusolver5getrfIPF16cusolverStatus_tP17cusolverDnContextiiPfiS7_PiS8_EfEEN4sycl3_V15eventEPKcT_RNSC_5queueEllPT0_lPlSK_lRKSt6vectorISD_SaISD_EEENKUlRNSC_7handlerEE0_clESS_EUlNSC_2idILi1EEEE_ was found
SYCL error code: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions