I design and build at the intersection of hardware and AI — from bare-metal firmware and real-time Linux kernels to ML compilers and on-device inference.
My background is rooted in embedded systems and hardware design: writing firmware for MCUs and SoCs, customizing Linux kernels for real-time constraints, designing carrier boards, and bringing up silicon. I've spent years working close to the metal — device trees, DMA engines, RTOS schedulers, and boot chains.
Now I'm pushing upward through the AI stack — building deep, hands-on expertise in MLIR and compiler infrastructure, inference optimization (TensorRT, tinygrad, kernel engineering), and AI application deployment on edge and accelerator platforms. Every project I take on connects the software to the silicon.
- Firmware & Real-Time Linux — FreeRTOS, embedded Linux BSP (Yocto, L4T), kernel drivers, device tree, secure boot, OTA
- Hardware Design — Custom carrier boards, schematic/PCB, FPGA prototyping (Xilinx Zynq, Vivado, HLS)
- AI Compiler & Inference — MLIR dialects, LLVM, TVM, tinygrad compiler internals, graph optimization, custom backends
- Inference Optimization — TensorRT, CUDA runtime, DLA scheduling, quantization (INT8/FP8), Triton/CUTLASS kernels
- Edge AI Deployment — Jetson Orin, Snapdragon, on-device perception pipelines, latency/power-constrained systems
Building toward designing a custom AI inference chip — working through the full 8-layer vertical stack from application frameworks down to RTL and physical implementation. Currently deepening skills in MLIR-based compilation pipelines, systolic array architecture, and hardware-software co-design.
I'm building a team to design a custom AI inference chip. If any of this resonates, reach out.
AI Hardware Engineer Roadmap — the open-source curriculum I built and maintain. 5 phases, 8 layers of the AI chip stack, from digital design and CUDA kernels through ML compilers and FPGA prototyping to chip architecture. This is the training ground. The roadmap is my resume.
I'm looking for engineers who want to go deep in one layer and understand all eight:
| Layer | Role I need | You bring |
|---|---|---|
| L1 Application | ML Inference Optimization Engineer | PyTorch/tinygrad, quantization, model optimization |
| L2 Compiler | AI Compiler Engineer | MLIR, TVM, LLVM, graph optimization, custom backends |
| L3 Runtime | Runtime / Driver Engineer | CUDA runtime, Linux kernel drivers, DMA, memory management |
| L4 Firmware | Firmware Engineer | FreeRTOS, bare-metal C/Rust, bootloaders, command processors |
| L5 Architecture | AI Accelerator Architect | Systolic arrays, dataflow design, NoC, memory hierarchy |
| L6 RTL | RTL / FPGA Design Engineer | SystemVerilog, UVM, HLS, timing closure, verification |
| L7 Physical | Physical Design Engineer | Place & route, STA, power integrity, DRC/LVS, EDA tools |
| L8 Fab & Package | Packaging / Process Engineer | Advanced packaging (CoWoS, chiplets), post-silicon validation, DFT |
If you see yourself in one of these layers — or across several — let's talk.
Background all layers. Master one. Build the chip.



