Mechanical Sympathy is All You Need
Built and scaled systems that can handle 5k->250k RPS w/o breaking a sweat.
Got into model serving and inference, enjoyed solving cold start, intelligent routing and optimizing GPU cluster utilization. Did a bit of RAG & Agents infra. Currently ML Infra - training, inference, comms collectives, storage, compiler backends, custom kernels optimizations & researching novel techniques.
High Agency individual deep in agentic-engineering mode. AI tools have enabled me to touch end-to-end infra from user facing APIs & Infra to tensors to metal. Always looking to maximize my learning curve 📈
- 🐘 YALI - Ultra-low-latency GPU comms collective. Outperforms NVIDIA NCCL P2P by 1.2 - 2.4x.
- ⏲️ Metered Compute - 5 reference architectures for reliably metering sync and async compute.
- 🔍 Inference Assayer - Compiler driven models <> HWs inference perf analyzing deterministic fast simulator lab.
- WIP / TBA
| Home | |
| Languages | |
| Inference | |
| Infra | |
| Accelerators | |
| Storage | |
| Middleware | |
| Cloud | |
| Build |
Inspired by
