[CIR][Lowering] Fix inconditional sign extension on vec.cmp op #1747

RiverDave · 2025-07-18T02:28:13Z

(Copied from my question on Discord)

I’ve been working on the vector to bit-mask related intrinsics for X86. I’ve been stuck specifically on X86::BI__builtin_ia32_cvtb2mask128(_mm256_movepi16_mask) and its variations with different vector/mask sizes.

In this case, we perform a vector comparison of vector<16xi16> and bitcast the resulting vector<16xi1> directly into a scalar integer mask (i16).

I’m successfully able to lower to cir:

    ...
    %5 = cir.vec.cmp(lt, %3, %4) : !cir.vector<!s16i x 16>, !cir.vector<!cir.int<u, 1> x 16>
    %6 = cir.cast(bitcast, %5 : !cir.vector<!cir.int<u, 1> x 16>), !u16i
    ...

There's an issue arises when lowering this to LLVM, the error message I'm getting is:

error: integer width of the output type is smaller or equal to the integer width of the input type

By looking at the test cases on the llvm dialect, this is related to the sext / zext instruction.

This is the cir → llvm dialect lowered for the latter:

        ...
    %14 = "llvm.icmp"(%12, %13) <{predicate = 2 : i64}> : (vector<16xi16>, vector<16xi16>) -> vector<16xi1>
    %15 = "llvm.sext"(%14) : (vector<16xi1>) -> vector<16xi1>
    %16 = "llvm.bitcast"(%15) : (vector<16xi1>) -> i16
        ...

This is seems to be the cause:

 %15 = "llvm.sext"(%14) : (vector<16xi1>) -> vector<16xi1>

The fix: Added a type check: if the result type does not differ from the expected type, we won't insert a sextOp

This patch introduces support for pointer TBAA, which can be enabled using the `-fpointer-tbaa` flag. By default, this feature is now enabled. To ensure test compatibility, the tests (`tbaa-enum.cpp`, `tbaa-enum.c`, and `tbaa-struct.cpp`) have been updated to include the `-fno-pointer-tbaa` flag. Related Pull Requests of OG: - llvm/llvm-project#76612 - llvm/llvm-project#116991

Similar PR: llvm#1434

Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.

Lower neon vcages_f32

This implements the missing feature `cir::setTargetAttributes`. Although other targets might also need attributes, this PR focuses on the CUDA-specific ones. For CUDA kernels (on device side, not stubs), they must have a calling convention of `ptx_kernel`. It is added here. CUDA kernels, as well as global variables, also involves lots of NVVM metadata, which is intended to be dealt with at the same place. It's marked with a new missing feature here.

Lower neon vmaxv_f32

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

This is part 2 of CUDA lowering. Still more to come! This PR generates `__cuda_register_globals` for functions only, without touching variables. It also fixes two discrepancies mentioned in Part 1, namely: - Now CIR will not generate registration code if there's nothing to register; - `__cuda_fatbin_wrapper` now becomes a constant.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

Lower neon vaddlv_s32

This is Part 3 of registration function generation. This generates `__cuda_module_dtor`. It cannot be placed in global dtors list, as treating it as a normal destructor will result in double-free in recent CUDA versions (see comments in OG). Rather, the function is passed as callback of `atexit`, which is called at the end of `__cuda_module_ctor`.

@Lancern

Traditional clang implementation: https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632 The problem here is that `__builtin_clz` allows undefined result, while `__lzcnt` doesn't. As a result, I have to create a new CIR for `__lzcnt`. Since the return type of those two builtin differs, I decided to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp` to inherit from it. I would like to hear your suggestions. C.c. @Lancern

This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <anominosgamer@gmail.com> Co-authored-by: anominos <46242743+anominos@users.noreply.github.com>

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404

The choice of adding a separate file imitates that of OG.

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

There are some subtleties here. This is the code in OG: ```cpp // note: this is different from default ABI if (!RetTy->isScalarType()) return ABIArgInfo::getDirect(); ``` which says we should return structs directly. It's correct, has have the same behaviour as `nvcc`, and it obeys the PTX ABI as well. The comment dates back to 2013 (see [this commit](llvm/llvm-project@f9329ff) -- it didn't provide any explanation either), so I believe it's outdated. I didn't include this comment in the PR.

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

…1487) This PR resolves an assertion failure in `CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying to emit a vtable entry to a virtual function whose type includes a pointer-to-member-function.

Lower neon vabsd_s64

…lvm#1431) Implements `::verify` for operations cir.atomic.xchg and cir.atomic.cmp_xchg I believe the existing regression tests don't get to the CIR level type check failure and I was not able to implement a case that does. Most attempts of reproducing cir.atomic.xchg type check failure were along the lines of: ``` int a; long long b,c; __atomic_exchange(&a, &b, &c, memory_order_seq_cst); ``` And they seem to never trigger the failure on `::verify` because they fail earlier in function parameter checking: ``` exmp.cpp:7:27: error: cannot initialize a parameter of type 'int *' with an rvalue of type 'long long *' 7 | __atomic_exchange(&a, &b, &c, memory_order_seq_cst); | ^~ ``` Closes llvm#1378 .

Lower neon vcaled_f64

This PR adds a new boolean flag to the `cir.load` and the `cir.store` operation that distinguishes nontemporal loads and stores. Besides, this PR also adds support for the `__builtin_nontemporal_load` and the `__builtin_nontemporal_store` intrinsic function.

Resolves llvm#1477 .

…1717) Big question mark here: When lowering target specific vector types: (`__m256i`, `__m128i`, `__m64`), I was hitting an unreachable statement which I removed and were preventing these types from being lowered. Not too familiar with it but it's related to the attribute `"min-legal-vector-width"="N"` which is not implemented for `cir::VectorType` as compared to OG. Is that a blocker for these intrinsics as of now? or is that something we wanna target before we merge x86 vector specific intrinsics?.

Backporting the VecCmpOp folder

… to deprecation

… of the same size (llvm#1728) The `cir::CastOp::verify` method was overly conservative, and would fail on any `bitcast` from vector to scalar or scalar to vector. Change List: - Extends the `cir::CastOp::verify` method to check if the source and result types are the same size using the `mlir::DataLayout` of the current scope, and succeeds if the sizes match. - Extends the CodeGen vectype tests with vector to scalar, scalar to vector and vector to vector conversions. - Extends the IR invalid tests with vector to scalar and scalar to vector conversions with different source and result sizes.

…back due to deprecation" This reverts commit 1bbf343.

Remove code after return statement

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…lvm#1722) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Couple of things I have questions about: 1. I duplicated function `getIntValueFromConstOp` from `CIRGenBuiltinAArch64.cpp`. I was wondering if that's correct or if there's a place where we can avoid that duplication. 2. For the tests related to `mm_prefetch` im not sure if it'd be correct to define them in a file eg: `sse-builtins.c` like it's currently done in the codegen lib. 3. I'm also aware we can emit a call for a `PreFetchOp` would that be required in this case? related: llvm#1414, llvm#1404 (A PR was previously opened but It was not resolved)

…1731)

…re (llvm#1732) just a few improvements to mirror og test cases in x86 for better reference.

…tribute (llvm#1733) - Remove redundant custom printer and parser for AddressSpace, relying instead on MLIR's default EnumAttr handling. - Leverage AddressSpace::Default to omit the attribute from the assembly form when not needed. Therefore, an empty attribute is no longer needed to represent the default address space. - Update PointerType to use the AddressSpace enum directly, instead of a boxed attribute.

Not too much to add. Added a method to call a masked store intrinsic from the builder. Haven't touched that class to much, so let me know if that's the right call. Also: Unfortunately there were a lot of test cases for the intrinsics in this PR, hope that's not a big hustle for review 😊

Backport ArraySubscript for ComplexType

Backport functional cast to ComplexType

…nsics (llvm#1738)

This patch bumps the windows CI container to windows server 2022 from windows server 2019. This is necessary as Github has sunsetted support for sever 2019, so we cannot build the container through GHA without updating. Using more recent versions is just good practice anyways. This will not roll out immediately and we'll have to make some TF changes to get deployed, but some additional validation first will be good anyways. Reviewers: lnihlen, tstellar, cmtice Reviewed By: cmtice Pull Request: llvm/llvm-project#148318 (cherry picked from commit 3e43915)

…style (llvm#1741) - This adds common `CIR_` prefix to all operation disambiguating them when used with other dialects. - Unifies traits style in operation definitions

- This fixes default value to be expected 65535 - Introduces DefaultGlobalCtorDtorPriority constant - Makes function to use I32Attr for priority instead of unnecessary attribute with reference to function

Seems like this is the wrong approach. This reverts commit bc91ef4.

This updates the lowering of CIR function aliases in such a way that they now actually become aliases in the final LLVM IR.

…lvm#1740) This PR has two parts: 1. Mimicking the OG [special case](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/lib/CodeGen/CGException.cpp#L690) for a single catch-all when getting dispatch blocks. The huge testcase I added, gotten by using [creduce](https://github.com/csmith-project/creduce) on a c++ file, crashed at this point [in our version](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/lib/CIR/CodeGen/CIRGenException.cpp#L789). 2. Fixing multiple destructor calls for the same object. For example, there were tests like [#1](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/test/CIR/CodeGen/try-catch-dtors.cpp#L370C1-L372C80) and [#2](https://github.com/llvm/clangir/blob/d030c9bff74f4f9504a61abe9b2c04a8777028a5/clang/test/CIR/CodeGen/conditional-cleanup.cpp#L217C1-L224C25), having a second destructor call to an already destroyed object. This PR fixes these and I have updated the tests. Also, I added `"CIR-NEXT"` at some points, to confirm the destructors are indeed called once. As usual, please let me know if you have any concerns.

This patch backports changes made to the bit operations in the upstream PR llvm/llvm-project#148378. Namely, this patch includes the following changes: - This patch removes the `bit.` prefix in the op mnemonic. The operation names now directly correspond to the builtin function names except for bswap which is represented by `cir.byte_swap` for more clarity. - Since all bit operations are `SameOperandsAndResultType`, this patch updates their assembly format and avoids spelling out the operand type twice.

PikachuHyA and others added 30 commits April 9, 2025 14:57

[CIR][CIRGen][Neon] Make vrnda emit RoundOp directly (llvm#1453)

7e7a821

Similar PR: llvm#1434

[CIR][CUDA] Support for built-in CUDA surface type (llvm#1455)

01c27e9

[CIR][CUDA] Fix address space values for NVPTX (llvm#1445)

01a1748

Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.

[CIR][CIRGen][Builtin][Neon] Lower neon vcages_f32 (llvm#1449)

f26fff3

Lower neon vcages_f32

[CIR][CIRGen][Builtin][Neon] Lower neon vmaxv_f32 (llvm#1460)

5f03b07

Lower neon vmaxv_f32

[CIR][CUDA] implement cuda constant variables (llvm#1444)

cadb738

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

[CIR][CIRGen][Builtin][Neon] Lower neon vaddlv_s32 (llvm#1464)

49383ee

Lower neon vaddlv_s32

[CIR][CUDA] Support for inbuilt texture types (llvm#1469)

bf52b48

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

[CIR] Lower signext and zeroext attributes (llvm#1473)

9b0b837

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

[CIR][CIRGen][TBAA] Add support for vtable pointer (llvm#1463)

f896192

[CIR][CIRGen][builtin][X86] handle _mm_lfence (llvm#1474)

6e31c6e

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404

[CIR][CUDA] Support device-side printf (llvm#1475)

0aef9ed

The choice of adding a separate file imitates that of OG.

[CIR][CIRGen][builtin] handle __popcnt (llvm#1479)

867d736

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

[CIR][NFC] Fix test failures caused by double spaces in check line (l…

9e8806a

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

[CIR][CIRGen][Builtin][Neon] Lower neon vabsd_s64 (llvm#1489)

bce7507

Lower neon vabsd_s64

[CIR][CIRGen][Builtin][Neon] Lower neon vcaled_f64 (llvm#1495)

e74e226

Lower neon vcaled_f64

[CIR][CIRGen][builtin] handle _mm_pause (llvm#1493)

dd37e38

[CIR] Emit nsw flag for unary integer operations (llvm#1485)

6a3e881

Resolves llvm#1477 .

RiverDave and others added 23 commits July 2, 2025 10:00

[CIR] Backport VecCmpOp folder (llvm#1709)

da1e2cd

Backporting the VecCmpOp folder

[Infra] Attempt to move to windows-2022 and get windows bots back due…

1bbf343

… to deprecation

Revert "[Infra] Attempt to move to windows-2022 and get windows bots …

0f6459e

…back due to deprecation" This reverts commit 1bbf343.

[CIR][NFC] Remove code after return statement (llvm#1730)

f17639c

Remove code after return statement

[CIR][CIRGen][Builtin][X86] Lower vec_set related intrinsics (llvm#…

33c739f

…1731)

[CIR][CIRGen][Builtin][NFC] Mirror X86 OG intrinsic test file structu…

f9d47e9

…re (llvm#1732) just a few improvements to mirror og test cases in x86 for better reference.

[CIR] Backport ArraySubscript for ComplexType (llvm#1736)

2bc0571

Backport ArraySubscript for ComplexType

[CIR] Backport functional cast to ComplexType (llvm#1737)

5d27d15

Backport functional cast to ComplexType

[CIR][CIRGen][Builtin][X86] Lower AVX mask-to-vector conversion intri…

577e995

…nsics (llvm#1738)

[CIR] Reformat Ops to use common CIR_ prefix and definition traits …

17aa78d

…style (llvm#1741) - This adds common `CIR_` prefix to all operation disambiguating them when used with other dialects. - Unifies traits style in operation definitions

[CIR] Fix Global Ctor/Dtor priority attributes (llvm#1743)

ee785a0

- This fixes default value to be expected 65535 - Introduces DefaultGlobalCtorDtorPriority constant - Makes function to use I32Attr for priority instead of unnecessary attribute with reference to function

Revert "[CI][Github] Bump Windows Container to Server 2022"

b0901d3

Seems like this is the wrong approach. This reverts commit bc91ef4.

[CIR] Implement function alias lowering (llvm#1739)

504714a

This updates the lowering of CIR function aliases in such a way that they now actually become aliases in the final LLVM IR.

RiverDave requested review from lanza and bcardosolopes as code owners July 18, 2025 02:28

[CIR][Lowering] Fix inconditional sign extension on vec.cmp

59ccba6

RiverDave force-pushed the fix-vecmp-sext branch from 15fcc6e to 59ccba6 Compare July 19, 2025 00:27

This was referenced Jul 19, 2025

[CIR][Lowering] Fix Vector Comparison Lowering with -fno-signed-char/unsigned operand RiverDave/clangir#1

Open

[CIR][CIRGen][Builtin][X86] Lower cvt*2mask intrinsics RiverDave/clangir#2

Open

lanza force-pushed the main branch from 8f89224 to e436c72 Compare July 22, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR][Lowering] Fix inconditional sign extension on vec.cmp op #1747

[CIR][Lowering] Fix inconditional sign extension on vec.cmp op #1747

RiverDave commented Jul 18, 2025

Uh oh!

Uh oh!

[CIR][Lowering] Fix inconditional sign extension on vec.cmp op #1747

Are you sure you want to change the base?

[CIR][Lowering] Fix inconditional sign extension on vec.cmp op #1747

Conversation

RiverDave commented Jul 18, 2025

Uh oh!

Uh oh!