Engineering Blog

Building systems
that survive
production.

Deep dives into distributed systems, AI evaluation, and the infrastructure that powers real-world software — how things actually work under the hood, explained the way I wish someone had explained them to me.

Building systems
that survive
production.

Recent posts

Your model migration passed. Here's what the aggregate didn't show.

When agent trace metrics lie: the span tree double-counting problem

Aggregate metrics are a blind spot in agent evaluation

Exactly-once semantics on at-least-once infrastructure

How to detect benchmark contamination in LLMs

Building systemsthat surviveproduction.

Recent posts

Your model migration passed. Here's what the aggregate didn't show.

When agent trace metrics lie: the span tree double-counting problem

Aggregate metrics are a blind spot in agent evaluation

Exactly-once semantics on at-least-once infrastructure

How to detect benchmark contamination in LLMs

Building systems
that survive
production.