Skip to content

Unsupported operation when using CUDA #713

@salbert83

Description

@salbert83

I previously raised this issue here, FluxML/Zygote.jl#1532, but was recommended it would be more appropriate here.

My environment. I have seen the same issue on Linux machines
Julia Version 1.11.0
Commit 501a4f25c2 (2024-10-07 11:40 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, icelake-client)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Status C:\Users\salbe\OneDrive\Documents\Research\JuliaBugs\Project.toml
[052768ef] CUDA v5.5.2
[7073ff75] IJulia v1.25.0
[e88e6eb3] Zygote v0.6.71

The example:

using CUDA, Zygote, LinearAlgebra  # [edit -- added using + code block]

f₁(x) = sum(abs2, exp.(log.(x) .* (1:length(x))))
f₂(x) = sum(abs2, x.^(1:length(x)))
x = randn(ComplexF64, 5);
z = CuArray{ComplexF64}(x);

# **Check the gradient calculations are consistent between the 2 functons**
test₁ = Zygote.gradient(f₁, x)[1]
test₂ = Zygote.gradient(f₂, x)[1]
norm(test₁ - test₂) / norm(test₁)
# **Output:**  2.2530284453414604e-16 <-- This is reasonable

# **Check the calculation using CUDA**
test₃ = Zygote.gradient(f₁, z)[1];
norm(test₁ - Array(test₃))/ norm(test₁)
# **Output:** 2.0454901873585542e-16

# **However, using f₂ generates an exception**
test₄ = Zygote.gradient(f₂, z)

Output:

InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#34#36")(::CUDA.CuKernelContext, ::CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var"#1409#1410"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to ≺(a, b) @ ForwardDiff [C:\Users\salbe\.julia\packages\ForwardDiff\PcZ48\src\dual.jl:54](file:///C:/Users/salbe/.julia/packages/ForwardDiff/PcZ48/src/dual.jl#line=53))
Stacktrace:
 [1] promote_rule
   @ [C:\Users\salbe\.julia\packages\ForwardDiff\PcZ48\src\dual.jl:407](file:///C:/Users/salbe/.julia/packages/ForwardDiff/PcZ48/src/dual.jl#line=406)
 [2] promote_type
   @ [.\promotion.jl:318](http://localhost:8888/promotion.jl#line=317)
 [3] ^
   @ [.\complex.jl:886](http://localhost:8888/complex.jl#line=885)
 [4] #1409
   @ [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\lib\broadcast.jl:276](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/lib/broadcast.jl#line=275)
 [5] _broadcast_getindex_evalf
   @ [.\broadcast.jl:673](http://localhost:8888/broadcast.jl#line=672)
 [6] _broadcast_getindex
   @ [.\broadcast.jl:646](http://localhost:8888/broadcast.jl#line=645)
 [7] getindex
   @ [.\broadcast.jl:605](http://localhost:8888/broadcast.jl#line=604)
 [8] #34
   @ [C:\Users\salbe\.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:59](file:///C:/Users/salbe/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl#line=58)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl

Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\validation.jl:147](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/validation.jl#line=146)
  [2] macro expansion
    @ [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:382](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=381) [inlined]
  [3] macro expansion
    @ [C:\Users\salbe\.julia\packages\TimerOutputs\NRdsv\src\TimerOutput.jl:253](file:///C:/Users/salbe/.julia/packages/TimerOutputs/NRdsv/src/TimerOutput.jl#line=252) [inlined]
  [4] macro expansion
    @ [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:381](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=380) [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\utils.jl:108](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/utils.jl#line=107)
  [6] emit_llvm
    @ [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\utils.jl:106](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/utils.jl#line=105) [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:100](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=99)
  [8] codegen
    @ [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:82](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=81) [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:79](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=78)
 [10] compile
    @ [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:74](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=73) [inlined]
 [11] #1145
    @ [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:250](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl#line=249) [inlined]
 [12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:34](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=33)
 [13] JuliaContext(f::Function)
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\driver.jl:25](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/driver.jl#line=24)
 [14] compile(job::GPUCompiler.CompilerJob)
    @ CUDA [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\compiler\compilation.jl:249](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl#line=248)
 [15] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\execution.jl:237](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/execution.jl#line=236)
 [16] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler [C:\Users\salbe\.julia\packages\GPUCompiler\2CW9L\src\execution.jl:151](file:///C:/Users/salbe/.julia/packages/GPUCompiler/2CW9L/src/execution.jl#line=150)
 [17] macro expansion
    @ [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:380](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl#line=379) [inlined]
 [18] macro expansion
    @ [.\lock.jl:273](http://localhost:8888/lock.jl#line=272) [inlined]
 [19] cufunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{GPUArrays.BrokenBroadcast{Union{}}, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, Zygote.var"#1409#1410"{typeof(^)}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{ComplexF64, 1}, Tuple{Bool}, Tuple{Int64}}, Base.Broadcast.Extruded{UnitRange{Int64}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::@Kwargs{})
    @ CUDA [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:375](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl#line=374)
 [20] cufunction
    @ [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:372](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl#line=371) [inlined]
 [21] macro expansion
    @ [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\compiler\execution.jl:112](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl#line=111) [inlined]
 [22] #launch_heuristic#1200
    @ [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\gpuarrays.jl:17](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/gpuarrays.jl#line=16) [inlined]
 [23] launch_heuristic
    @ [C:\Users\salbe\.julia\packages\CUDA\2kjXI\src\gpuarrays.jl:15](file:///C:/Users/salbe/.julia/packages/CUDA/2kjXI/src/gpuarrays.jl#line=14) [inlined]
 [24] _copyto!
    @ [C:\Users\salbe\.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:78](file:///C:/Users/salbe/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl#line=77) [inlined]
 [25] copyto!
    @ [C:\Users\salbe\.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:44](file:///C:/Users/salbe/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl#line=43) [inlined]
 [26] copy
    @ [C:\Users\salbe\.julia\packages\GPUArrays\qt4ax\src\host\broadcast.jl:29](file:///C:/Users/salbe/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl#line=28) [inlined]
 [27] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Nothing, Zygote.var"#1409#1410"{typeof(^)}, Tuple{CuArray{ComplexF64, 1, CUDA.DeviceMemory}, UnitRange{Int64}}})
    @ Base.Broadcast [.\broadcast.jl:867](http://localhost:8888/broadcast.jl#line=866)
 [28] broadcast_forward(::Function, ::CuArray{ComplexF64, 1, CUDA.DeviceMemory}, ::UnitRange{Int64})
    @ Zygote [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\lib\broadcast.jl:282](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/lib/broadcast.jl#line=281)
 [29] adjoint
    @ [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\lib\broadcast.jl:361](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/lib/broadcast.jl#line=360) [inlined]
 [30] _pullback(::Zygote.Context{false}, ::typeof(Base.Broadcast.broadcasted), ::CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, ::Function, ::CuArray{ComplexF64, 1, CUDA.DeviceMemory}, ::UnitRange{Int64})
    @ Zygote [C:\Users\salbe\.julia\packages\ZygoteRules\M4xmc\src\adjoint.jl:67](file:///C:/Users/salbe/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl#line=66)
 [31] _apply(::Function, ::Vararg{Any})
    @ Core [.\boot.jl:946](http://localhost:8888/boot.jl#line=945)
 [32] adjoint
    @ [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\lib\lib.jl:203](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/lib/lib.jl#line=202) [inlined]
 [33] _pullback
    @ [C:\Users\salbe\.julia\packages\ZygoteRules\M4xmc\src\adjoint.jl:67](file:///C:/Users/salbe/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl#line=66) [inlined]
 [34] broadcasted
    @ [.\broadcast.jl:1326](http://localhost:8888/broadcast.jl#line=1325) [inlined]
 [35] f₂
    @ [.\In](http://localhost:8888/In)[3]:2 [inlined]
 [36] _pullback(ctx::Zygote.Context{false}, f::typeof(f₂), args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
    @ Zygote [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\compiler\interface2.jl:0](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/compiler/interface2.jl#line=-1)
 [37] pullback(f::Function, cx::Zygote.Context{false}, args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
    @ Zygote [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\compiler\interface.jl:90](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/compiler/interface.jl#line=89)
 [38] pullback
    @ [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\compiler\interface.jl:88](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/compiler/interface.jl#line=87) [inlined]
 [39] gradient(f::Function, args::CuArray{ComplexF64, 1, CUDA.DeviceMemory})
    @ Zygote [C:\Users\salbe\.julia\packages\Zygote\Tt5Gx\src\compiler\interface.jl:147](file:///C:/Users/salbe/.julia/packages/Zygote/Tt5Gx/src/compiler/interface.jl#line=146)
 [40] top-level scope
    @ In[7]:1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions