I am an AI Grad student at Northeastern University.
Pinned Loading
-
Sigmoid-TopK-Fusion
Sigmoid-TopK-Fusion PublicFused Sigmoid+TopK Triton kernel for MoE routing — 3.1x faster than PyTorch baseline. Inspired by Sarvam AI's sovereign model inference stack.
Jupyter Notebook 1
-
High-Performance-Reduction-Kernels
High-Performance-Reduction-Kernels PublicCUDA C reduction kernels benchmarking with Triton, PyTorch and CUB primitives
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
