Skip to content

Properly handle cuda arch for unsupported function #1853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented May 28, 2025

For example, bfloat16 are natively supported after CC 80, we need to throw an exception or avoid failed comfiguration.
It is mostly handled by cmake option when user only compiles for one cuda arch.
Cuda allows to compile library for different arch.
However, CUDA_ARCH is only available in device code not host code, so using macro on host side does not have effect actually.
There is one host macro but will give the entire list.
We can only rely on the runtime dispatch on CC in this case and throw an exception when they are not available.
To achieve that, we need to provide a working version of atomic add just for compilation.

Side note: there are still some issue that compiling bfloat16 kernel in templated lambda after 12.2 on the architecture not natively supporting bfloat16 leads unknown device kernel in runtime. but if duplicate the kernel with full specialization will work. This requires further investigation.

@yhmtsai yhmtsai self-assigned this May 28, 2025
@ginkgo-bot ginkgo-bot added mod:cuda This is related to the CUDA module. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels May 28, 2025
@yhmtsai yhmtsai requested a review from a team May 28, 2025 16:17
@yhmtsai yhmtsai force-pushed the properly_handle_cuda_arch branch from 66748e1 to 432fe5a Compare July 30, 2025 12:45
@yhmtsai yhmtsai requested a review from pratikvn July 30, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. type:matrix-format This is related to the Matrix formats type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants