Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
-
Updated
Jul 22, 2025 - Rust
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
An efficient concurrent graph processing system
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
LAMB go brrr
Compile time kernels fusion and expression trees as Alpaka boost.odeint backend. This is my team project developed in collaboration with and under the supervision of HZDR.
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.
Add a description, image, and links to the kernel-fusion topic page so that developers can more easily learn about it.
To associate your repository with the kernel-fusion topic, visit your repo's landing page and select "manage topics."