Michael Goin

Michael Goin @mgoin

systems engineer making inference fast

About

I've been working in ML inference since 2019, currently focused on making SOTA open-source LLMs run fast on various hardware accelerators in vLLM as a core maintainer.

I like working across the software stack wherever the bottleneck is - CPU, GPU, NPU, [compute, memory, io]-bound, etc using Python, C++, PyTorch, Triton, CUDA, CUTLASS. Most of my time goes into profiling, benchmarking, and figuring out why things are slow. I enjoy learning about the latest hardware and working hard to utilize it fully.

Before that, my background was in HPC where I worked on robotics, materials science, energy simulations, and neuromorphic computing at ORNL and UTK.

I'm currently working at Red Hat on vLLM to power the open-source AI ecosystem with fast and easy inference. Before acquisition by Red Hat, I was at Neural Magic, where I worked on vLLM and originally built a sparsity-aware inference compiler that optimized CNNs, Transformers, and other models for CPUs.

If you want to reach me, the best way is to ping me @mgoin on vLLM Slack. I'm always happy to collaborate on projects or ideas related to inference performance!

Work

Changelog

Things I've shipped or helped ship.

Talks