Properly handle cuda arch for unsupported function #1853
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For example, bfloat16 are natively supported after CC 80, we need to throw an exception or avoid failed comfiguration.
It is mostly handled by cmake option when user only compiles for one cuda arch.
Cuda allows to compile library for different arch.
However,
CUDA_ARCH
is only available in device code not host code, so using macro on host side does not have effect actually.There is one host macro but will give the entire list.
We can only rely on the runtime dispatch on CC in this case and throw an exception when they are not available.
To achieve that, we need to provide a working version of atomic add just for compilation.
Side note: there are still some issue that compiling bfloat16 kernel in templated lambda after 12.2 on the architecture not natively supporting bfloat16 leads unknown device kernel in runtime. but if duplicate the kernel with full specialization will work. This requires further investigation.