Hi there,
I'm looking for some insight into how GPU reset is handled when running into out-of-memory (OOM) issues.
My system is running:
- Kernel: 6.15.9.arch1-1
- Intel OneAPI Base Toolkit: 2025.2
- Intel Compute Runtime: 25.27.34303.5
- Driver: xe
Hardware:
- AMD Ryzen 9900X
- Intel Arc B580
- 48GB DDR5 RAM @ 6200MHz
When I run AI workloads like image generation, the GPU occasionally runs out of memory. When that happens, the entire desktop freezes and becomes completely unresponsive, requiring a hard reboot to recover. I'm particularly wary of hard resets since I have a couple of mechanical drives configured in a RAID array, and I'd really prefer to avoid any risk of data corruption or filesystem damage.