Skip to content
View ai-hpc's full-sized avatar

Highlights

  • Pro

Block or report ai-hpc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ai-hpc/README.md

AI Hardware Engineer

Firmware · Real-Time Linux · Hardware Design · AI Inference Optimization


I design and build at the intersection of hardware and AI — from bare-metal firmware and real-time Linux kernels to ML compilers and on-device inference.

My background is rooted in embedded systems and hardware design: writing firmware for MCUs and SoCs, customizing Linux kernels for real-time constraints, designing carrier boards, and bringing up silicon. I've spent years working close to the metal — device trees, DMA engines, RTOS schedulers, and boot chains.

Now I'm pushing upward through the AI stack — building deep, hands-on expertise in MLIR and compiler infrastructure, inference optimization (TensorRT, tinygrad, kernel engineering), and AI application deployment on edge and accelerator platforms. Every project I take on connects the software to the silicon.

What I work on

  • Firmware & Real-Time Linux — FreeRTOS, embedded Linux BSP (Yocto, L4T), kernel drivers, device tree, secure boot, OTA
  • Hardware Design — Custom carrier boards, schematic/PCB, FPGA prototyping (Xilinx Zynq, Vivado, HLS)
  • AI Compiler & Inference — MLIR dialects, LLVM, TVM, tinygrad compiler internals, graph optimization, custom backends
  • Inference Optimization — TensorRT, CUDA runtime, DLA scheduling, quantization (INT8/FP8), Triton/CUTLASS kernels
  • Edge AI Deployment — Jetson Orin, Snapdragon, on-device perception pipelines, latency/power-constrained systems

Current focus

Building toward designing a custom AI inference chip — working through the full 8-layer vertical stack from application frameworks down to RTL and physical implementation. Currently deepening skills in MLIR-based compilation pipelines, systolic array architecture, and hardware-software co-design.

If you want to work with me

I'm building a team to design a custom AI inference chip. If any of this resonates, reach out.

AI Hardware Engineer Roadmap — the open-source curriculum I built and maintain. 5 phases, 8 layers of the AI chip stack, from digital design and CUDA kernels through ML compilers and FPGA prototyping to chip architecture. This is the training ground. The roadmap is my resume.

I'm looking for engineers who want to go deep in one layer and understand all eight:

Layer Role I need You bring
L1 Application ML Inference Optimization Engineer PyTorch/tinygrad, quantization, model optimization
L2 Compiler AI Compiler Engineer MLIR, TVM, LLVM, graph optimization, custom backends
L3 Runtime Runtime / Driver Engineer CUDA runtime, Linux kernel drivers, DMA, memory management
L4 Firmware Firmware Engineer FreeRTOS, bare-metal C/Rust, bootloaders, command processors
L5 Architecture AI Accelerator Architect Systolic arrays, dataflow design, NoC, memory hierarchy
L6 RTL RTL / FPGA Design Engineer SystemVerilog, UVM, HLS, timing closure, verification
L7 Physical Physical Design Engineer Place & route, STA, power integrity, DRC/LVS, EDA tools
L8 Fab & Package Packaging / Process Engineer Advanced packaging (CoWoS, chiplets), post-silicon validation, DFT

If you see yourself in one of these layers — or across several — let's talk.


Background all layers. Master one. Build the chip.

Pinned Loading

  1. ai-hardware-engineer-roadmap ai-hardware-engineer-roadmap Public

    Design a custom AI inference chip. That is the goal.

    HTML 48 12

  2. commaai/openpilot commaai/openpilot Public

    openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.

    Python 60.5k 10.8k

  3. neptuneprivacy/triton-vm-prover neptuneprivacy/triton-vm-prover Public

    High-performance C++/CUDA GPU-accelerated STARK prover for Triton VM

    Rust 2 1

  4. neptuneprivacy/xnt-gpu-miner neptuneprivacy/xnt-gpu-miner Public

    High-performance C++/CUDA GPU-accelerated XNT Miner

    C++ 2

  5. xilinx-pynq-z2-inkjet-3d-printer xilinx-pynq-z2-inkjet-3d-printer Public

    Inkjet-3D Printer with Xilinx PYNQ-Z2

    VHDL 9

  6. Vitis-AI Vitis-AI Public

    Forked from Xilinx/Vitis-AI

    Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.

    Python 8