[SPIRV] convert i128 allocas to <2 x i64> #734

simeonschaub · 2025-11-04T12:06:46Z

It seems like the issue is that codegen hard codes MAX_ALIGN based on
the host platform ABI and assumes that if the host supports i128
allocas the target will support it as well. For now just handle this by
converting i128 allocas to <2 x i64> allocas. Discovered while
working on JuliaGPU/OpenCL.jl#379

To reproduce the issue:

julia> using OpenCL, SIMD

julia> OpenCL.code_llvm(NTuple{2, Vec{8, Float32}}) do x...
           @noinline +(x...)
       end
;  @ REPL[7]:2 within `#11`
define void @julia__11_16515(ptr noalias nocapture noundef nonnull sret([1 x <8 x float>]) align 16 dereferenceable(32) %sret_return, ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[1]::Vec", ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[2]::Vec") local_unnamed_addr {
top:
  %"new::Tuple" = alloca [2 x [1 x <8 x float>]], align 16
  %sret_box = alloca [2 x i128], align 16
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %"new::Tuple", ptr noundef nonnull align 16 dereferenceable(32) %"x[1]::Vec", i64 32, i1 false)
  %0 = getelementptr inbounds i8, ptr %"new::Tuple", i64 32
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %0, ptr noundef nonnull align 16 dereferenceable(32) %"x[2]::Vec", i64 32, i1 false)
  call fastcc void @julia___16519(ptr noalias nocapture noundef sret([1 x <8 x float>]) %sret_box, ptr nocapture readonly %"new::Tuple", ptr nocapture readonly %0)
  call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %sret_return, ptr noundef nonnull align 16 dereferenceable(32) %sret_box, i64 32, i1 false)
  ret void
}

A similar workaround might be needed for Metal, but I don't have a Mac
to test

@noinline

It seems like the issue is that codegen hard codes `MAX_ALIGN` based on the host platform ABI and assumes that if the host supports `i128` allocas the target will support it as well. For now just handle this by converting `i128` allocas to `<2 x i64>` allocas. Discovered while working on JuliaGPU/OpenCL.jl#379 To reproduce the issue: ```julia-repl julia> using OpenCL, SIMD julia> OpenCL.code_llvm(NTuple{2, Vec{8, Float32}}) do x... @noinline +(x...) end ; @ REPL[7]:2 within `#11` define void @julia__11_16515(ptr noalias nocapture noundef nonnull sret([1 x <8 x float>]) align 16 dereferenceable(32) %sret_return, ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[1]::Vec", ptr nocapture noundef nonnull readonly align 16 dereferenceable(32) %"x[2]::Vec") local_unnamed_addr { top: %"new::Tuple" = alloca [2 x [1 x <8 x float>]], align 16 %sret_box = alloca [2 x i128], align 16 call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %"new::Tuple", ptr noundef nonnull align 16 dereferenceable(32) %"x[1]::Vec", i64 32, i1 false) %0 = getelementptr inbounds i8, ptr %"new::Tuple", i64 32 call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %0, ptr noundef nonnull align 16 dereferenceable(32) %"x[2]::Vec", i64 32, i1 false) call fastcc void @julia___16519(ptr noalias nocapture noundef sret([1 x <8 x float>]) %sret_box, ptr nocapture readonly %"new::Tuple", ptr nocapture readonly %0) call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 16 dereferenceable(32) %sret_return, ptr noundef nonnull align 16 dereferenceable(32) %sret_box, i64 32, i1 false) ret void } ``` A similar workaround might be needed for Metal, but I don't have a Mac to test

github-actions · 2025-11-04T12:07:43Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/src/spirv.jl b/src/spirv.jl
index ea552e7..acf525a 100644
--- a/src/spirv.jl
+++ b/src/spirv.jl
@@ -293,50 +293,50 @@ function convert_i128_allocas!(mod::LLVM.Module)
     changed = false
     @tracepoint "convert i128 allocas" begin
 
-    for f in functions(mod), bb in blocks(f)
-        for inst in instructions(bb)
-            if inst isa LLVM.AllocaInst
-                alloca_type = LLVMType(LLVM.API.LLVMGetAllocatedType(inst))
-
-                # Check if this is an i128 or an array of i128
-                if alloca_type isa LLVM.ArrayType
-                    T = eltype(alloca_type)
-                else
-                    T = alloca_type
-                end
-                if T isa LLVM.IntegerType && width(T) == 128
-                    # replace i128 with <2 x i64>
-                    vec_type = LLVM.VectorType(LLVM.Int64Type(), 2)
+        for f in functions(mod), bb in blocks(f)
+            for inst in instructions(bb)
+                if inst isa LLVM.AllocaInst
+                    alloca_type = LLVMType(LLVM.API.LLVMGetAllocatedType(inst))
 
+                    # Check if this is an i128 or an array of i128
                     if alloca_type isa LLVM.ArrayType
-                        array_size = length(alloca_type)
-                        new_alloca_type = LLVM.ArrayType(vec_type, array_size)
+                        T = eltype(alloca_type)
                     else
-                        new_alloca_type = vec_type
+                        T = alloca_type
                     end
-                    align_val = alignment(inst)
-
-                    # Create new alloca with vector type
-                    @dispose builder=IRBuilder() begin
-                        position!(builder, inst)
-                        new_alloca = alloca!(builder, new_alloca_type)
-                        alignment!(new_alloca, align_val)
-
-                        # Bitcast the new alloca back to the original pointer type
-                        # XXX: The issue only seems to manifest itself on LLVM >= 18
-                        #      where we use opaque pointers anyways, so not sure this
-                        #      is needed
-                        old_ptr_type = LLVMType(LLVM.API.LLVMTypeOf(inst.ref))
-                        bitcast_ptr = bitcast!(builder, new_alloca, old_ptr_type)
-
-                        replace_uses!(inst, bitcast_ptr)
-                        erase!(inst)
-                        changed = true
+                    if T isa LLVM.IntegerType && width(T) == 128
+                        # replace i128 with <2 x i64>
+                        vec_type = LLVM.VectorType(LLVM.Int64Type(), 2)
+
+                        if alloca_type isa LLVM.ArrayType
+                            array_size = length(alloca_type)
+                            new_alloca_type = LLVM.ArrayType(vec_type, array_size)
+                        else
+                            new_alloca_type = vec_type
+                        end
+                        align_val = alignment(inst)
+
+                        # Create new alloca with vector type
+                        @dispose builder = IRBuilder() begin
+                            position!(builder, inst)
+                            new_alloca = alloca!(builder, new_alloca_type)
+                            alignment!(new_alloca, align_val)
+
+                            # Bitcast the new alloca back to the original pointer type
+                            # XXX: The issue only seems to manifest itself on LLVM >= 18
+                            #      where we use opaque pointers anyways, so not sure this
+                            #      is needed
+                            old_ptr_type = LLVMType(LLVM.API.LLVMTypeOf(inst.ref))
+                            bitcast_ptr = bitcast!(builder, new_alloca, old_ptr_type)
+
+                            replace_uses!(inst, bitcast_ptr)
+                            erase!(inst)
+                            changed = true
+                        end
                     end
                 end
             end
         end
-    end
 
     end
     return changed
diff --git a/test/spirv.jl b/test/spirv.jl
index e890314..de41a85 100644
--- a/test/spirv.jl
+++ b/test/spirv.jl
@@ -112,37 +112,41 @@ end
 
 end
 
-@testset "replace i128 allocas" begin
-    mod = @eval module $(gensym())
+    @testset "replace i128 allocas" begin
+        mod = @eval module $(gensym())
         # reimplement some of SIMD.jl
         struct Vec{N, T}
             data::NTuple{N, Core.VecElement{T}}
         end
         @generated function fadd(x::Vec{N, Float32}, y::Vec{N, Float32}) where {N}
             quote
-                Vec(Base.llvmcall($"""
-                    %ret = fadd <$N x float> %0, %1
-                    ret <$N x float> %ret
-                """, NTuple{N, Core.VecElement{Float32}}, NTuple{2, NTuple{N, Core.VecElement{Float32}}}, x.data, y.data))
+                Vec(
+                    Base.llvmcall(
+                        $"""
+                            %ret = fadd <$N x float> %0, %1
+                            ret <$N x float> %ret
+                        """, NTuple{N, Core.VecElement{Float32}}, NTuple{2, NTuple{N, Core.VecElement{Float32}}}, x.data, y.data
+                    )
+                )
             end
         end
         kernel(x...) = @noinline fadd(x...)
-    end
+        end
 
-    @test @filecheck begin
-        # TODO: should structs of `NTuple{VecElement{T}}` be passed by value instead of sret?
-        check"CHECK-NOT: i128"
-        check"CHECK-LABEL: define void @{{(julia|j)_kernel_[0-9]+}}"
-        @static VERSION >= v"1.12" && check"CHECK: alloca <2 x i64>, align 16"
-        SPIRV.code_llvm(mod.kernel, NTuple{2, mod.Vec{4, Float32}}; backend, dump_module=true)
-    end
+        @test @filecheck begin
+            # TODO: should structs of `NTuple{VecElement{T}}` be passed by value instead of sret?
+            check"CHECK-NOT: i128"
+            check"CHECK-LABEL: define void @{{(julia|j)_kernel_[0-9]+}}"
+            @static VERSION >= v"1.12" && check"CHECK: alloca <2 x i64>, align 16"
+            SPIRV.code_llvm(mod.kernel, NTuple{2, mod.Vec{4, Float32}}; backend, dump_module = true)
+        end
 
-    @test @filecheck begin
-        check"CHECK-NOT: i128"
-        check"CHECK-LABEL: define void @{{(julia|j)_kernel_[0-9]+}}"
-        @static VERSION >= v"1.12" && check"CHECK: alloca [2 x <2 x i64>], align 16"
-        SPIRV.code_llvm(mod.kernel, NTuple{2, mod.Vec{8, Float32}}; backend, dump_module=true)
+        @test @filecheck begin
+            check"CHECK-NOT: i128"
+            check"CHECK-LABEL: define void @{{(julia|j)_kernel_[0-9]+}}"
+            @static VERSION >= v"1.12" && check"CHECK: alloca [2 x <2 x i64>], align 16"
+            SPIRV.code_llvm(mod.kernel, NTuple{2, mod.Vec{8, Float32}}; backend, dump_module = true)
+        end
     end
-end
 
 end

codecov · 2025-11-04T13:26:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.23%. Comparing base (523cbfc) to head (006906f).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #734      +/-   ##
==========================================
- Coverage   75.59%   75.23%   -0.37%     
==========================================
  Files          24       24              
  Lines        3639     3670      +31     
==========================================
+ Hits         2751     2761      +10     
- Misses        888      909      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maleadt · 2025-11-07T10:38:13Z

Seems reasonable. Can you add a test here?

simeonschaub requested a review from maleadt November 5, 2025 12:40

simeonschaub added 3 commits November 7, 2025 13:08

add tests

8ce64db

fix deprecation warning

b6065f7

only look for <2 x i64> allocas on 1.12+

006906f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPIRV] convert i128 allocas to <2 x i64> #734

[SPIRV] convert i128 allocas to <2 x i64> #734

Uh oh!

simeonschaub commented Nov 4, 2025

Uh oh!

github-actions bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

maleadt commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPIRV] convert i128 allocas to <2 x i64> #734

Are you sure you want to change the base?

[SPIRV] convert i128 allocas to <2 x i64> #734

Uh oh!

Conversation

simeonschaub commented Nov 4, 2025

Uh oh!

github-actions bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maleadt commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Nov 4, 2025 •

edited

Loading

codecov bot commented Nov 4, 2025 •

edited

Loading