[DEPRECATED] Moved to ROCm/rocm-systems repo
-
Updated
Mar 31, 2026 - Python
[DEPRECATED] Moved to ROCm/rocm-systems repo
Online CUDA Occupancy Calculator
(Spring 2017) Assignment 2: GPU Executor
Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line
GPU Drano Static Analysis for GPU programs.
Prototype for a SPIR-V assembler and dissasembler. It provides a composable Java interface for generating SPIR-V code at runtime.
Open source skill library for AI coding agents to write, optimize, and debug high performance compute kernels across CUDA, Triton, and quantized workloads.
A self-hosted low-level functional-style programming language 🌀
Noeris — autonomous kernel fusion discovery + drop-in LLM training accelerator. Cross-op fusion, autotuning, beats cuDNN on sliding-window. pip install noeris.
High-performance GPU-accelerated C# scripting for Rhino Grasshopper, powered by ILGPU
Medical AI diagnostics system implementing real compiled Mojo GPU kernels with MAX Graph integration
🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy
Runtime correctness checker for custom CUDA kernels. Attach a single decorator to periodically verify outputs against a reference implementation, with outlier-biased sampling and zero training graph impact.
16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official performance on A100 — Ampere architecture
High-performance Triton kernels for NVIDIA H100. Implements fused FP8 LayerNorm, tiled FlashAttention, and SRAM-optimized memory primitives for Hopper architecture.
LLM primitives rebuilt in Triton — FlashAttention 2.52×, fused AdamW 3.45×, Bias+GELU 14.65× faster than PyTorch
Benchmarking hand-written CUDA C, Numba, and Triton self-attention kernels against PyTorch's SDPA - how fast can you go depending on the tool?
Add a description, image, and links to the gpu-kernels topic page so that developers can more easily learn about it.
To associate your repository with the gpu-kernels topic, visit your repo's landing page and select "manage topics."