InspectorAI

🚀 Inspiration

Most programming languages have mature linting ecosystems — tools that enforce best practices, prevent bugs, and improve production readiness. But agent frameworks like LangGraph lacked an equivalent.
We asked ourselves:

Why don’t agent graphs have linters that catch mistakes early and promote safe, reliable design?

InspectorAI was built to fill that gap.

🧠 What InspectorAI Does

Detects problematic functions using an SLM-classifier trained to identify fallible, side-effecting, or risky behaviors.
Runs static analysis to pinpoint exact lines, nodes, or graph edges responsible for the issue.
Automatically fixes errors using token-optimized LLM strategies.
Re-lints and verifies the updated code to ensure no regressions.

⚙️ How It Works

1. Detection Layer (SLM + heuristics)

An SLM scans function docstrings and code to classify them into categories like:

fallible
has side effects
potentially dangerous

This determines which parts of the code need deeper inspection.

2. Static Analysis Engine

InspectorAI analyzes:

graph wiring issues
dangerous or fallible functions
unsafe side effects

It generates a compact representation of the problem for downstream agents.

3. Multi-Agent Repair System

A coordinator agent decides which repair strategy to use.
A fixer agent rewrite, patch, or refactor the code.
A verification agent re-checks all issues to ensure correctness and stability.

4. Token-Optimized Fixing Strategies

Instead of blindly passing entire codebases into an LLM, InspectorAI selects one of three targeted modes:

Snippet Fixes — for small, local errors
Graph-Context Fixes — for wiring issues
Full-Context Fixes — only when necessary for multi-error, complex scenarios

This keeps costs low and accuracy high.

📚 What We Learned

The right context beats more context. Fix quality improved dramatically when we optimized the scope of what we send to the LLM.
Hybrid systems are more robust. Combining static analysis with SLMs and LLMs outperformed any single approach.
Verification is essential. Automatic fixes require a re-lint to avoid regressions.
Agent orchestration matters. Even a small rule-based controller improved stability and cost-efficiency.

🧩 Challenges We Faced

Token efficiency: Preventing LLM calls from exploding in cost while keeping context relevant.
Graph serialization: Representing LangGraph nodes/edges in a compact and meaningful way.
False positives: Requiring confidence thresholds and cross-checks with the static layer.
Maintaining correctness: Ensuring fixes don’t break intended behavior or introduce new errors.
Workflow complexity: Coordinating multiple agents in a predictable way.

Built With

langgraph
nim
python
react
typescript

Updates

Mehul Chourasia started this project — Nov 16, 2025 10:10 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.