Skip to content

Conversation

@dvermd
Copy link
Contributor

@dvermd dvermd commented Feb 21, 2025

Fixes #604

This is the GIMPLE code generated

      if (param0 == 0) goto then; else goto else;
      then:
      zeros = 128;
      goto after;
      else:
      _2 = param0 >> 64;
      _3 = (size_t) _2;
      if (_3 != 0) goto ctlz_then; else goto ctlz_else;
      after:
      stack_var_1.15_4 = &stack_var_1;
      MEM[(unsigned int *)stack_var_1.15_4] = zeros;
      stack_var_1.16_5 = &stack_var_1;
      loadedValue2 = MEM[(unsigned int *)stack_var_1.16_5];
      D.4444 = loadedValue2;
      return D.4444;
      ctlz_then:
      _6 = param0 >> 64;
      _7 = (size_t) _6;
      _8 = __builtin_clzll (_7);
      zeros = (unsigned int) _8;
      goto ctlz_after;
      ctlz_else:
      _9 = (size_t) param0;
      _10 = __builtin_clzll (_9);
      _11 = (unsigned int) _10;
      zeros = _11 + 64;
      goto ctlz_after;
      ctlz_after:
      zeros = zeros;
      goto after;

@dvermd
Copy link
Contributor Author

dvermd commented Feb 21, 2025

It looks like the some CI jobs are failing on

thread 'rustc' panicked at src/int.rs:176:13:
assertion failed: a_type.dyncast_array().is_some()

which is some code I didn't touch.

Can my changes have some side effects ?
I see that #631 also have these jobs failing. Perhaps it's not my changes which bring these failures.

Any hints ?

@antoyo
Copy link
Contributor

antoyo commented Feb 21, 2025

Not sure it's related. I opened a new dummy PR to check if the tests pass, though.

@antoyo
Copy link
Contributor

antoyo commented Feb 21, 2025

The CI passed, so the problem is in here.

I suspect this is because you changed the return type: it used to return an array type and your change makes it return a u32.

@dvermd
Copy link
Contributor Author

dvermd commented Feb 23, 2025

The return type of the if width == 128 was already a u32 because the block ended with

                let res = self.context.new_array_access(self.location, result, index);

                return self.gcc_int_cast(res.to_rvalue(), result_type);

which will get an element in the array and cast it as result_type which is u32.

Edit: formatting

@dvermd
Copy link
Contributor Author

dvermd commented Feb 23, 2025

Ok, finally found the line pointing in my change let leading_zeroes = self.add(low_leading_zeroes, sixty_four); where low_leading_zeroesand sixty_four are of different types.

The problem I see is that the if branch should have been taken.

Copy link
Contributor

@antoyo antoyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay: I've been very busy.

let result = self.current_func()
.new_local(None, array_type, "count_loading_zeroes_results");

// Algorithm from: https://stackoverflow.com/a/28433850/389119 updated to check for high 64bits being 0
Copy link
Contributor

@antoyo antoyo Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems your new algorithm significantly diverges from the one linked here.
If so, could you please replace this link with another pointing to the algorithm you used? If you can't, could you please add comments below to make this easier to understand?

@dvermd
Copy link
Contributor Author

dvermd commented Jul 29, 2025

Hi, sorry for the delay, I missed your reply notification and got caught up by IRL stuff.

I'm going to add a comment on the new algorithm which is:

// if arg is 0 return 128
// else if the 64 high bits of arg are not 0, return clzll(64 high bits of arg)
//      else return 64 + clzll(64 low bits of arg)

This adds an if to the generated code but we need to check for 0 before calling ctzll

I think there is also a problem of UB in intrinsic/simd.rs

                let index = bx.context.new_rvalue_from_long(bx.i32_type, i as i64);
                let value = bx.extract_element(vector, index).to_rvalue();
                let value_type = value.get_type();
                let element = if name == sym::simd_ctlz {
                    bx.count_leading_zeroes(value_type.get_size() as u64 * 8, value)
                } else {
                    bx.count_trailing_zeroes(value_type.get_size() as u64 * 8, value)
                };

where 0 is not special cased like in intrinsic/mod.rs.

So I'll move the 0 special case handling in count_leading_zeroes and remove it from codegen_intrinsic_call

As found by @FractalFir, the same problems apply to trailing_zeros / cttz

@dvermd dvermd force-pushed the fix_128b_leading_zeros_ub branch from 1f5f68e to 68a69d8 Compare July 31, 2025 04:11
@dvermd
Copy link
Contributor Author

dvermd commented Jul 31, 2025

There is still a diff in the handling of clz and ctz intrinsics in the if to select the first builtin function

    fn count_leading_zeroes(&mut self, width: u64, arg: RValue<'gcc>) -> RValue<'gcc> {
...
            if arg_type.is_uint(self.cx) {
                "__builtin_clz"
            }

vs

    fn count_trailing_zeroes(&mut self, _width: u64, arg: RValue<'gcc>) -> RValue<'gcc> {
...
            if arg_type.is_uchar(self.cx) || arg_type.is_ushort(self.cx) || arg_type.is_uint(self.cx) {
                // NOTE: we don't need to & 0xFF for uchar because the result is undefined on zero.
                ("__builtin_ctz", self.cx.uint_type)
            }

They both have the same comment

            // TODO(antoyo): write a new function Type::is_compatible_with(&Type) and use it here
            // instead of using is_uint().

Should I add the uchar and ushort types to the leading zeros test ?

@dvermd
Copy link
Contributor Author

dvermd commented Jul 31, 2025

@antoyo Do you have any hints to reproduce the --without-int128 environment locally ? I can't figure out by myself what makes the CI fail.

@antoyo
Copy link
Contributor

antoyo commented Jul 31, 2025

@antoyo Do you have any hints to reproduce the --without-int128 environment locally ? I can't figure out by myself what makes the CI fail.

If you compile gcc yourself, you can apply this patch.

Otherwise, you can try setting this variable to false, but I'm not sure if this will have the same behavior.

@dvermd dvermd force-pushed the fix_128b_leading_zeros_ub branch 5 times, most recently from 2f49779 to 97aaccd Compare August 7, 2025 08:24
@dvermd
Copy link
Contributor Author

dvermd commented Aug 8, 2025

I finally made through it once I realized lshr ends the current block just like gcc_icmp. I moved it out of the then and else blocks, thus it is computed even when not used.

Copy link
Contributor

@antoyo antoyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!
Again, I'm very sorry for the long delay it took me to review this.
Thanks a lot for your contribution.

@dvermd dvermd force-pushed the fix_128b_leading_zeros_ub branch 5 times, most recently from f9ec048 to 5e93107 Compare November 17, 2025 08:25
@dvermd dvermd force-pushed the fix_128b_leading_zeros_ub branch 3 times, most recently from e7f960a to 0a30199 Compare November 23, 2025 16:05
@dvermd
Copy link
Contributor Author

dvermd commented Nov 24, 2025

There is a compilation error on m68k because the __ctzdi2 symbol is not found. From my research it looks like adding -lgcc would fix the error. Is this something we want for the tests ?

I also tried to add -C llvm-args=sanitize-undefined and -C link-args=-lubsan to compiler args to make the test fail if the UB is re-introduced but m68k doesn't find libubsan.so so I remove it for now.

@antoyo
Copy link
Contributor

antoyo commented Nov 24, 2025

There is a compilation error on m68k because the __ctzdi2 symbol is not found. From my research it looks like adding -lgcc would fix the error. Is this something we want for the tests ?

Correct me if I'm wrong, but I was under the impression that a previous version of this PR passed this test. Is this the case?
If so, do you know what change caused this failure?

@dvermd dvermd force-pushed the fix_128b_leading_zeros_ub branch from 0a30199 to 9beb9bb Compare November 25, 2025 04:46
@dvermd
Copy link
Contributor Author

dvermd commented Nov 25, 2025

There is a compilation error on m68k because the __ctzdi2 symbol is not found. From my research it looks like adding -lgcc would fix the error. Is this something we want for the tests ?

Correct me if I'm wrong, but I was under the impression that a previous version of this PR passed this test. Is this the case? If so, do you know what change caused this failure?

I think the last green on m68k was on the commit 97aaccd in this workflow.

Then I added some tests like you suggested and the error occurred while compiling it.

@antoyo
Copy link
Contributor

antoyo commented Nov 25, 2025

This intrinsic seems to be in compiler-builtins, so I believe if you change the test to use core (and possibly std), this should work without any flag needed.

@dvermd dvermd force-pushed the fix_128b_leading_zeros_ub branch from 9beb9bb to a9b15b6 Compare November 25, 2025 18:36
Copy link
Contributor

@antoyo antoyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Everything looks good now.
There's only this question left regarding the new test:

} else {
compiler.args([
"-C",
"llvm-args=sanitize-undefined",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sending the correct flags here?
I ran the new test on the master branch and it passes while I assumed it should fail on the master branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, these flags are the one from #604 and were triggering the UB detection before. Now, I can't seem to reproduce the failure on the master branch with my local test either.

Running
CG_RUSTFLAGS="-Cllvm-args=-fsanitize=undefined -Clink-args=-lubsan" CG_GCCJIT_VERBOSE=1 ../../y.sh cargo rustc --bin u128ub
shows
COLLECT_GCC_OPTIONS='-m64' '-c' '-o' '/tmp/libgccjit-fN4JFD/fake.o' '-fno-use-linker-plugin' '-fexceptions' '-fPIC' '-v' '-mtune=generic' '-march=x86-64' '-dumpdir' '/tmp/libgccjit-fN4JFD/fake.'

which doesn't contain the options passed by llvm-args or link-args.

Some bisecting will need to happen, but for now, I can't rebuild rustc_codegen_gcc on commit fcac229 which was the master when I started this dev.

$ ./y.sh clean all
[BUILD] build system
    Finished `release` profile [optimized] target(s) in 0.00s
Successfully ran `clean all`

$ rm -rf build build_system/target/

$ ./y.sh prepare
[BUILD] build system
   Compiling boml v0.3.1
   Compiling y v0.1.0 (/home/.../rustc_codegen_gcc/build_system)
    Finished `release` profile [optimized] target(s) in 1.70s
[GIT] init (cwd): `build/build_sysroot/sysroot_src`
[GIT] add (cwd): `build/build_sysroot/sysroot_src`
[GIT] commit (cwd): `build/build_sysroot/sysroot_src`
[GIT] apply `patches/0001-Add-stdarch-Cargo.toml-for-testing.patch`
[master ceba044] Patch ../../../patches/0001-Add-stdarch-Cargo.toml-for-testing.patch
 1 file changed, 21 insertions(+)
 create mode 100644 library/stdarch/Cargo.toml
[GIT] apply `patches/0022-core-Disable-not-compiling-tests.patch`
[master 9d5f8ca] Patch ../../../patches/0022-core-Disable-not-compiling-tests.patch
 2 files changed, 15 insertions(+)
 create mode 100644 library/core/tests/Cargo.toml
[GIT] apply `patches/0028-core-Disable-long-running-tests.patch`
[master 1c31bc1] Patch ../../../patches/0028-core-Disable-long-running-tests.patch
 1 file changed, 2 insertions(+)
Successfully prepared libcore for building
Cloning into 'build/rand'...
remote: Enumerating objects: 20865, done.
remote: Counting objects: 100% (334/334), done.
remote: Compressing objects: 100% (194/194), done.
remote: Total 20865 (delta 186), reused 173 (delta 140), pack-reused 20531 (from 4)
Receiving objects: 100% (20865/20865), 5.20 MiB | 6.03 MiB/s, done.
Resolving deltas: 100% (13343/13343), done.
Cloning into 'build/regex'...
remote: Enumerating objects: 10296, done.
remote: Total 10296 (delta 0), reused 0 (delta 0), pack-reused 10296 (from 1)
Receiving objects: 100% (10296/10296), 8.32 MiB | 8.43 MiB/s, done.
Resolving deltas: 100% (6781/6781), done.
Cloning into 'build/simple-raytracer'...
remote: Enumerating objects: 332, done.
remote: Counting objects: 100% (21/21), done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 332 (delta 6), reused 15 (delta 5), pack-reused 311 (from 1)
Receiving objects: 100% (332/332), 1.31 MiB | 5.25 MiB/s, done.
Resolving deltas: 100% (198/198), done.
[GIT] apply `patches/crates/0001-Remove-deny-warnings.patch`
[detached HEAD ce9d2996] Patch ../../patches/crates/0001-Remove-deny-warnings.patch
 1 file changed, 1 deletion(-)
Successfully ran `prepare`

$ ./y.sh build --sysroot
[BUILD] build system
    Finished `release` profile [optimized] target(s) in 0.00s
Downloading `https://github.com/rust-lang/gcc/releases/download/master-d6f5a708104a98199ac0f01a3b6b279a0f7c66d3/libgccjit.so`...
###################################################################################################################################################################################### 100.0%
Downloaded libgccjit.so version d6f5a708104a98199ac0f01a3b6b279a0f7c66d3 successfully!
Using `/home/.../rustc_codegen_gcc/build/libgccjit/d6f5a708104a98199ac0f01a3b6b279a0f7c66d3` as path for libgccjit
   Compiling libc v0.2.168
   Compiling gccjit_sys v0.6.0
   Compiling gccjit v2.5.0
   Compiling rustc_codegen_gcc v0.1.0 (/home/.../rustc_codegen_gcc)
    Finished `dev` profile [optimized + debuginfo] target(s) in 13.16s
[BUILD] sysroot
warning: no edition set: defaulting to the 2015 edition while the latest is 2024
    Updating crates.io index
     Locking 10 packages to latest compatible versions
    Updating addr2line v0.22.0 -> v0.24.2
      Adding adler2 v2.0.1
    Updating compiler_builtins v0.1.118 -> v0.1.140 (available: v0.1.160)
      Adding gimli v0.31.1
      Adding gimli v0.32.3
    Updating hashbrown v0.14.5 -> v0.15.5
    Updating libc v0.2.155 -> v0.2.177
    Updating miniz_oxide v0.7.4 -> v0.8.9
    Updating unwinding v0.2.2 -> v0.2.8 (requires Rust 1.88)
      Adding windows-targets v0.0.0 (/home/.../rustc_codegen_gcc/build/build_sysroot/sysroot_src/library/windows_targets)
   Compiling core v0.0.0 (/home/.../rustc_codegen_gcc/build/build_sysroot/sysroot_src/library/core)
   Compiling compiler_builtins v0.1.140
   Compiling libc v0.2.177
   Compiling std v0.0.0 (/home/.../rustc_codegen_gcc/build/build_sysroot/sysroot_src/library/std)
   Compiling rustc-std-workspace-core v1.99.0 (/home/.../rustc_codegen_gcc/build/build_sysroot/sysroot_src/library/rustc-std-workspace-core)
   Compiling adler2 v2.0.1
error[E0463]: can't find crate for `compiler_builtins`

For more information about this error, try `rustc --explain E0463`.
error: could not compile `adler2` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
Command `["cargo", "build", "--target", "x86_64-unknown-linux-gnu"]` failed
Command failed to run: Command `cargo build --target x86_64-unknown-linux-gnu` (running in folder `build/build_sysroot`) exited with status Some(101)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if they triggered the UB detection before, I'm OK with merging this PR as is.
Is this OK with you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've triggered the UB detection on a previous push. I think this PR can be merged. Thanks for your patience.

@antoyo antoyo merged commit c3c8a9a into rust-lang:master Nov 28, 2025
38 checks passed
@antoyo
Copy link
Contributor

antoyo commented Nov 28, 2025

Thanks a lot for your contribution!
And sorry again for taking some time to review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UB in the implementation of the 128 bit ctlz intrinsic

2 participants