Skip to content

Conversation

@krasznaa
Copy link
Member

This all started out with me trying to see if CoPilot would be able to do all of this automatically / by itself. Unfortunately that all worked very poorly, so I rather did the whole migration by hand in the end. 😦

But still, I wanted to start making a traccc::hip library finally.

To start things off, I converted traccc::cuda::clusterization_algorithm into a traccc::hip::clusterization_algorithm. Code-wise it was pretty much a copy-paste, with a cuda => hip replacement applied. The only tricky part was to convince the build system to behave more or less reliably.

Which for me specifically it by now does. But I'll need some additional eyes as well. @StewMH, please have a look. Specifically at what ( did to the setup of rocThrust. As it was giving me a world of hurt. 😦

For now I didn't do anything specific to the CI setup. But (for me a bit) surprisingly TRACCC_BUILD_HIP is already exercised by the "ALPAKA HIP SYCL" build. 😕 @StewMH, should we take things a bit more apart? The Alpaka build should really only test builds through Alpaka. I'd rather test the build of "native HIP code" in a separate CI job. Or rather, ideally we'd add 2 new CI jobs. One building the HIP code with its AMD backend, and another one using the NVIDIA backend. As one can very easily break either one of these without breaking the other. 😦

@krasznaa krasznaa added the hip Changes related to AMD HIP label Nov 13, 2025
@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from 08d35fa to dcedeab Compare November 13, 2025 16:00
@StewMH
Copy link
Contributor

StewMH commented Nov 17, 2025

Hi Attila,

Great to see this. TRACCC_BUILD_HIP is enabled for Alpaka-HIP so that the rocThrust tests are also built. Didn't seem worth having its own build just for this, but now that we have all this code it definitely makes sense to test it properly.

PUBLIC traccc::core vecmem::core
PRIVATE HIP::hiprt traccc::device_common )

# # Ensure relaxed constexpr is enabled for device compilation (nvcc/hipcc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you decide against this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code was left in by mistake in that place. It was Copilot doing its own stuff there.

I rather put this sort of a thing into cmake/traccc-compiler-options-hip.cmake. We do need to make this setting, but in a more complicated way than how Copilot tried it. (As these flags are only meaningful when hipcc uses nvcc in the background. Not when it's building for an AMD backend.)

I'll do some cleanup on such leftovers today. As long as you're happy with the direction with all of this, I'll push forward.

It's mostly what I'm doing to rocThrust in this PR that you should have a look at. As that's a rather fragile part, which may affect Alpaka as well. (Not in the unit tests, but I didn't run anything more thorough.)

@stephenswat
Copy link
Member

We need to have a serious conversation about the number of backends that we have now. If this PR goes in, we have four different methods of targetting AMD GPUs (HIP, Kokkos, Alpaka, SYCL), or which three (HIP, Kokkos, Alpaka) deliver exactly the same code to the ROCm compiler and can therefore, almost by definition, not lead to any meaningful performance benefits.

As with all the backends for traccc, this puts the maintenance burden on the people actually developing the algorithms. We've gone from having to duplicate all our changes twice to three times to four times and now to five times. It's not a sustainable development model.

@StewMH
Copy link
Contributor

StewMH commented Nov 17, 2025

Alpaka and hip tests work fine for me, without getting into bigger issues.

@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from dcedeab to 9abb85d Compare November 17, 2025 15:16
@krasznaa krasznaa marked this pull request as ready for review November 17, 2025 15:16
Added mostly infrastructure code to it at first, along with a
clusterization algorithm. All absolutely based on the CUDA code,
with very minimal changes.
@krasznaa krasznaa force-pushed the HIPBackend-main-20251113 branch from 9abb85d to e410533 Compare November 20, 2025 08:43
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hip Changes related to AMD HIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants