🧠 NodeBench AI — Multi-Agent Document Intelligence System

Overview
NodeBench AI is a Notion-style editor backed by specialized AI agents. The Fast Agent Panel stitches together document editing, research, spreadsheet templating, media search, and SEC analysis without forcing users to bounce between apps.

🚀 Inspiration

We kept watching knowledge workers hop between docs, spreadsheets, search tabs, and chatbots. The bet behind NodeBench AI is that those workflows should live in one AI-native workspace where agents can understand context and take action immediately.

🧩 How We Built It

Frontend

React 19 + Vite + TypeScript, styled with Tailwind CSS utilities and tailwind-merge.
BlockNote/TipTap + EditorJS JSON for the rich-text editor, react-spreadsheet for tables, and React Flow for agent timelines and thinking graphs.
Storybook + Vitest + Testing Library to keep the UI stable.

Backend & Orchestration (Convex)

Convex hosts the real-time data model, document APIs, and the full agent runtime.
The Fast Agent Panel streams through convex/fastAgentPanelStreaming.ts using @convex-dev/agent and @convex-dev/persistent-text-streaming for low-latency updates.
convex/agents/specializedAgents.ts defines a Coordinator Agent that fans out to Document, Media, SEC, Web, and EntityResearch agents. Each tool call is Zod-validated and runs inside Convex actions/mutations.
Tooling lives under convex/tools (document operations, Linkup web search, SEC filings, media analyzers, spreadsheet parsers).
Entity research is cached in convex/entityContexts.ts so follow-up questions hit warm data instead of re-calling Linkup every time.

AI Stack

OpenAI GPT-5 (nano/mini variants) for coordination, planning, and reasoning.
Google Gemini 2.0 Flash via convex/genai.ts for structured data extraction, vision analysis, and doc rewriting when OpenAI isn’t ideal.
Vercel AI SDK wrappers plus the Model Context Protocol (MCP) bridge (convex/mcp.ts, convex/aiAgents.ts) to call third-party tools like Tavily search.
RAG indexing through @convex-dev/rag so document answers stay grounded.

Evaluation & Reliability

convex/tools/evaluation provides scripted scenarios, LLM-as-a-judge scoring, and quick regression runs (npm run eval:quick).
Streaming transcripts are logged as agent runs (agentRuns + agentRunEvents) for replay and debugging.

🧠 What We Learned

Structured tools tame hallucinations. Zod schemas and typed tool registries made argument drift obvious and recoverable.
Evaluation is a first-class feature. The LLM judge caught behavior regressions that unit tests and manual QA missed.
Explainability matters. Streaming thinking steps and React Flow visualizations help users (and us) trust multi-agent decisions.
Caching pays off. Entity research caching turned repeated knowledge requests from 10-second waits into instant answers.

⚙️ Challenges

Coordinating delegation. Getting the coordinator to parallelize document, media, SEC, and research agents required tight tool contracts and aggressive “no-clarification” policies.
Keeping streams consistent. We had to reconcile optimistic UI updates with server responses; PersistentTextStreaming plus run-event logging solved race conditions.
Tool reliability. MCP integrations (Tavily, Linkup) can be flaky—so we built retries, fallback messaging, and evaluation coverage around them.
Cost/latency balance. The Gemini + GPT dual path means every call needs routing heuristics and usage tracking (insertApiUsage) to stay inside budget.