NVIDIAN ai-hpc

AI Hardware Engineer

Firmware · Real-Time Linux · Hardware Design · AI Inference Optimization

I design and build at the intersection of hardware and AI — from bare-metal firmware and real-time Linux kernels to ML compilers and on-device inference.

My background is rooted in embedded systems and hardware design: writing firmware for MCUs and SoCs, customizing Linux kernels for real-time constraints, designing carrier boards, and bringing up silicon. I've spent years working close to the metal — device trees, DMA engines, RTOS schedulers, and boot chains.

Now I'm pushing upward through the AI stack — building deep, hands-on expertise in MLIR and compiler infrastructure, inference optimization (TensorRT, tinygrad, kernel engineering), and AI application deployment on edge and accelerator platforms. Every project I take on connects the software to the silicon.

What I work on

Firmware & Real-Time Linux — FreeRTOS, embedded Linux BSP (Yocto, L4T), kernel drivers, device tree, secure boot, OTA
Hardware Design — Custom carrier boards, schematic/PCB, FPGA prototyping (Xilinx Zynq, Vivado, HLS)
AI Compiler & Inference — MLIR dialects, LLVM, TVM, tinygrad compiler internals, graph optimization, custom backends
Inference Optimization — TensorRT, CUDA runtime, DLA scheduling, quantization (INT8/FP8), Triton/CUTLASS kernels
Edge AI Deployment — Jetson Orin, Snapdragon, on-device perception pipelines, latency/power-constrained systems

Current focus

Building toward designing a custom AI inference chip — working through the full 8-layer vertical stack from application frameworks down to RTL and physical implementation. Currently deepening skills in MLIR-based compilation pipelines, systolic array architecture, and hardware-software co-design.

If you want to work with me

I'm building a team to design a custom AI inference chip. If any of this resonates, reach out.

AI Hardware Engineer Roadmap — the open-source curriculum I built and maintain. 5 phases, 8 layers of the AI chip stack, from digital design and CUDA kernels through ML compilers and FPGA prototyping to chip architecture. This is the training ground. The roadmap is my resume.

I'm looking for engineers who want to go deep in one layer and understand all eight:

Layer	Role I need	You bring
L1 Application	ML Inference Optimization Engineer	PyTorch/tinygrad, quantization, model optimization
L2 Compiler	AI Compiler Engineer	MLIR, TVM, LLVM, graph optimization, custom backends
L3 Runtime	Runtime / Driver Engineer	CUDA runtime, Linux kernel drivers, DMA, memory management
L4 Firmware	Firmware Engineer	FreeRTOS, bare-metal C/Rust, bootloaders, command processors
L5 Architecture	AI Accelerator Architect	Systolic arrays, dataflow design, NoC, memory hierarchy
L6 RTL	RTL / FPGA Design Engineer	SystemVerilog, UVM, HLS, timing closure, verification
L7 Physical	Physical Design Engineer	Place & route, STA, power integrity, DRC/LVS, EDA tools
L8 Fab & Package	Packaging / Process Engineer	Advanced packaging (CoWoS, chiplets), post-silicon validation, DFT

If you see yourself in one of these layers — or across several — let's talk.

Background all layers. Master one. Build the chip.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIAN ai-hpc

Achievements

Achievements

Highlights

Block or report ai-hpc

AI Hardware Engineer

What I work on

Current focus

If you want to work with me

Pinned Loading

Uh oh!