See through the mist.

Eval-driven infrastructure for AI systems.

go get github.com/greynewell/mist-go
Tools

matchspec

Eval framework for AI systems. Define datasets, write graders, run suites, gate deployments on passing thresholds.

Docs →Tutorials →

infermux

Inference router for AI systems. Load balance across providers, fail over automatically, cost-optimize model calls.

Docs →Tutorials →

schemaflux

Structured data compiler. Parse markdown and frontmatter, run 12-pass IR pipeline, emit typed output.

Docs →Tutorials →

tokentrace

Observability for AI inference. Track cost, latency, and quality across your stack. Alert on regressions.

Docs →Tutorials →

mist-go

Shared Go library implementing the MIST protocol. Protocol, transport, metrics, circuit breaking, checkpointing. Zero external dependencies.

Docs →

Use Cases

AI Agents

Cascading errors, context rot, and agents that ignore instructions. What the benchmarks don't show about production agents.

Model Harnesses

Bad data, silent failures, and catastrophic forgetting. The real problems behind fine-tuning that tutorials skip.

RL Environments

Reward hacking, reproducibility crises, and the debugging abyss. Why RL training fails and what's missing from the toolchain.

Built with MIST

swe-bench-fast

SWE-bench eval harness with native ARM64 containers. 6.3x test runner speedup on Apple Silicon and AWS Graviton.

Read the writeup →