Mahesh MaheshJakkala

Hi, I’m Mahesh 👋

I build CPU-first systems for running large language models - reducing inference cost, memory usage, and latency.

→ 8.6× faster than PyTorch CPU (INT8 + AVX2)
→ 4× lower memory footprint
→ Pure C (no ML frameworks)

All benchmarks are reproducible.

I focus on making LLM inference deployable everywhere - not just on GPUs.

LLM inference runs millions to billions of times.

Even small efficiency gains translate directly into millions of dollars saved.

Example:

At 10M requests/day:

→ ~$17,000 saved per day
→ ~$6M saved per year

I work on making this shift possible.

Contributor to ggml-org/llama.cpp
- Fixed integer type inconsistencies in split helpers
- PR: ggml-org/llama.cpp#18894
Built a CPU-first LLM engine:
- Explicit memory layout control
- Quantized inference
- Benchmarked against baseline implementations

Making LLM systems:

If you work on ML systems, inference engines, or compilers - let’s connect.