You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
π Reproducer System (Complete): Full-featured standalone kernel script generation with template support, tensor reconstruction, and multiple import modes. Extract any traced kernel into a self-contained Python script for debugging, testing, and sharing.
πΎ TensorBlobManager: Production-ready content-addressed tensor storage with automatic compression, deduplication, quota management, and efficient disk usage. Enables high-fidelity kernel reproduction with actual tensor data.
π§ SASS Disassembly Support: Optional NVIDIA SASS disassembly during compilation tracing for low-level debugging and performance analysis. Toggle via enable_sass_dump parameter or TRITONPARSE_DUMP_SASS environment variable.
π― Enhanced Context Manager: Configurable TritonParseManager context manager with support for trace launch control, inductor compilation splitting, and flexible parsing parameters.
β‘ CLI Modernization: Refactored to subcommand structure (tritonparseoss parse, tritonparseoss reproduce) with unified entry point and improved argument handling.
π Auto-enable Inductor Launch Tracing: Automatic detection and tracing of PyTorch Inductor-compiled kernels without manual configuration.
π Website Improvements: Light mode color scheme, improved stack display in Launch Analysis, and better file diff navigation.
Removed deprecated test for triton_kernels Tensor functionality
Updated test suite for current codebase
Compatibility notes
Breaking Change: CLI now uses subcommand structure. Old usage python run.py <source> must be updated to tritonparse parse <source> or python run.py parse <source>.
New Dependencies: SASS disassembly requires NVIDIA CUDA Binary Utilities (nvdisasm). This is optional and only needed if enable_sass_dump=True.
Storage: TensorBlobManager introduces new blob storage directory structure. Default quota is 100GB; configure via TensorBlobManager initialization if needed.
Context Manager API: Enhanced with new parameters. Fully backward compatible with sensible defaults.
Upgrade guidance
Update CLI commands: Change python run.py <source> to tritonparseoss parse <source> or use the new tritonparseoss command if installed via pip.
Reproducer usage: Use tritonparseoss reproduce ./parsed_output/trace.ndjson.gz --line <N> --out-dir <output> to generate standalone kernel scripts.
SASS disassembly: Opt-in by setting TRITONPARSE_DUMP_SASS=1 or passing enable_sass_dump=True to structured_logging.init(). Requires nvdisasm in PATH.
Tensor storage: Enable high-fidelity reproduction by using TensorBlobManager (enabled by default when enable_trace_launch=True).
Context manager: Use enhanced TritonParseManager for more control over tracing and parsing behavior.