Multi-agent NL-to-SQL copilot with governed analytics, audit trails, and multi-warehouse support
Nexus-Hive turns natural-language business questions into audited SQL, executes them safely against a warehouse, and returns chart-ready answers with a full agent trace. Every step -- translation, policy check, execution, visualization -- is independently inspectable, testable, and governed.
Why this matters: Most NL-to-SQL tools generate SQL and run it. Nexus-Hive inserts a policy engine, audit trail, and retry loop between generation and execution -- making it safe for environments where data access requires governance.
flowchart TB
User["Business User / Analyst"]
User -->|"Natural language question"| API["FastAPI Runtime\n/api/ask"]
subgraph AgentPipeline["LangGraph Multi-Agent Pipeline"]
direction TB
T["Agent 1: Translator\nNL --> SQL via Ollama/LLM\n+ heuristic fallback"]
E["Agent 2: Executor\nPolicy engine + read-only SQL execution"]
V["Agent 3: Visualizer\nChart.js config generation"]
T --> E
E -->|"error + retry budget > 0"| T
E -->|"success"| V
end
API --> AgentPipeline
subgraph Governance["Governance Layer"]
direction LR
PE["Policy Engine\nDeny / Review / Allow"]
AT["Audit Trail\nJSONL + query tags"]
GE["Gold Eval Suite\nSQL quality scoring"]
end
E --- PE
E --- AT
subgraph Warehouses["Warehouse Adapters"]
direction LR
SQLite["SQLite\n(demo, active by default)"]
SF["Snowflake\n(live when configured)"]
DB["Databricks\n(live when configured)"]
end
E --> Warehouses
V --> UI["Frontend\nChart.js + agent trace viewer"]
The pipeline is built with LangGraph as a compiled state graph. Each agent is a node with typed state (AgentState), connected by edges with conditional routing. When the Executor rejects SQL (policy denial or execution error), the graph routes back to the Translator for correction -- up to 3 retries -- without re-running the Visualizer.
git clone https://github.com/KIM3310/Nexus-Hive.git && cd Nexus-Hive
make install # creates venv, installs deps
python3 seed_db.py # seeds 10k enterprise sales records into SQLite
make run # starts uvicorn on http://localhost:8000Open http://localhost:8000 for the frontend, or http://localhost:8000/docs for Swagger UI.
docker compose up app
# With live LLM inference via Ollama:
docker compose --profile with-ollama up
docker exec nexus-hive-ollama ollama pull phi3make verify # runs lint + pytest + smoke test against a live server| Layer | Technology | Role |
|---|---|---|
| API Framework | FastAPI 0.115+ | Async HTTP, OpenAPI docs, middleware |
| Agent Orchestration | LangGraph + LangChain Core | State graph, conditional edges, typed agent state |
| LLM Runtime | Ollama (phi3 default) | Local inference for SQL generation + chart config |
| Policy Engine | Custom Python | Deny/review/allow decisions, sensitive column gating |
| Warehouse (Demo) | SQLite + Pandas | Zero-config local execution with 10k seeded records |
| Warehouse (Prod) | Snowflake, Databricks | Live adapters via snowflake-connector-python / databricks-sdk |
| Visualization | Chart.js | Bar, line, pie, doughnut charts from query results |
| Security | HMAC-signed cookies, RBAC | Operator sessions, token auth, role-based column gating |
| Resilience | Circuit Breaker | Ollama failure isolation with auto-recovery |
| Infrastructure | Docker, Kubernetes, Terraform | Container, K8s manifests, GCP Cloud Run via Terraform |
| CI/CD | GitHub Actions | Lint (ruff), test (pytest + coverage), Docker build |
| Adapter | Status | Gated By | Execution Mode |
|---|---|---|---|
| SQLite (demo) | Active by default | -- | local-sqlite |
| Snowflake | Live when configured | SNOWFLAKE_ACCOUNT env var |
snowflake-live |
| Databricks | Live when configured | DATABRICKS_HOST + auth |
databricks-live |
All adapters implement the same 5-method interface (get_schema, run_scalar_query, fetch_date_window, build_table_profiles, execute_sql_preview) and return results in an identical format. The Executor node and Visualizer node never branch on adapter type. See ADR-002 for the design rationale.
| Feature | Description | Endpoint |
|---|---|---|
| Policy Engine | Deny write ops, block SELECT *, gate sensitive columns by role, flag non-aggregated queries for review |
POST /api/policy/check |
| Audit Trails | Every query logged with request ID, role, policy verdict, adapter, execution time | GET /api/query-audit/recent |
| Query Tags | Structured tags mapping onto Snowflake QUERY_TAG and Databricks warehouse tag conventions |
GET /api/schema/query-tag |
| Session Boards | Operator surfaces for pending reviews, approval histories, query throughput | GET /api/query-session-board |
| Gold Evals | Built-in evaluation suite scoring generated SQL against expected feature patterns | GET /api/evals/nl2sql-gold/run |
| Approval Workflows | Review-required queries produce actionable approval bundles | GET /api/query-approval-board |
# Ask a question (returns stream URL for agent trace)
curl -X POST http://localhost:8000/api/ask \
-H "Content-Type: application/json" \
-d '{"question": "Show total net revenue by region"}'
# Stream agent trace (SSE)
curl -N "http://localhost:8000/api/stream?q=Show+total+net+revenue+by+region&rid=REQUEST_ID"
# Policy check -- preview SQL before execution
curl -X POST http://localhost:8000/api/policy/check \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT region_name, SUM(net_revenue) FROM sales GROUP BY region_name", "role": "analyst"}'
# Runtime metadata
curl http://localhost:8000/api/meta
# Run gold eval suite
curl http://localhost:8000/api/evals/nl2sql-gold/runInteractive docs: http://localhost:8000/docs (Swagger) | http://localhost:8000/redoc (ReDoc)
From pytest 8.3.5 on Python 3.11 --
make verify(lint + test + smoke)
| Metric | Value |
|---|---|
| Test files | 8 |
| Total test cases | 80+ |
| Policy engine tests | 38 (deny, review, allow, sensitive columns, query tags) |
| Agent orchestration tests | 12 (translator, executor, visualizer, routing) |
| API endpoint tests | 15 (health, meta, ask, policy, audit, schema) |
| SQL validation tests | 9 (read-only enforcement, injection blocking) |
| Circuit breaker tests | 6 (state transitions, timeout recovery) |
| Endpoint | Avg Response |
|---|---|
GET /health |
21 ms |
GET /api/meta |
16 ms |
GET /api/runtime/brief |
42 ms |
GET /api/evals/nl2sql-gold/run |
9 ms |
GET /api/schema/* |
< 1 ms |
pip install -e ".[snowflake]"
export SNOWFLAKE_ACCOUNT=your_account.us-east-1
export SNOWFLAKE_USER=your_username
export SNOWFLAKE_PASSWORD=your_password
export SNOWFLAKE_DATABASE=ANALYTICSpip install -e ".[databricks]"
export DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
export DATABRICKS_AUTH_TYPE=databricks-cli
export DATABRICKS_WAREHOUSE_ID=your_sql_warehouse_id
python scripts/bootstrap_databricks_demo.pyKubernetes
kubectl apply -f infra/k8s/namespace.yaml
kubectl apply -f infra/k8s/GCP Cloud Run -- Terraform configs in infra/terraform/. See infra/terraform/README.md for variables and outputs.
Docker
docker compose up appProduction-grade overlays for regulated and high-scale environments.
A full Helm chart lives at helm/nexus-hive/ with three value overlays:
# Default / dev-staging
helm upgrade --install nexus-hive helm/nexus-hive \
-n analytics --create-namespace
# Production: HPA, PDB, NetworkPolicy, ServiceMonitor, external secrets, HA spread
helm upgrade --install nexus-hive helm/nexus-hive \
-n analytics --create-namespace \
-f helm/nexus-hive/values.yaml \
-f helm/nexus-hive/values-prod.yaml \
--set image.tag=0.2.0
# Air-gapped: Ollama sidecar, no external LLM, internal registry only
helm upgrade --install nexus-hive helm/nexus-hive \
-n analytics --create-namespace \
-f helm/nexus-hive/values.yaml \
-f helm/nexus-hive/values-airgap.yamlSee helm/nexus-hive/README.md for the full values reference.
infra/k8s/production/ ships a hardened namespace baseline: Pod Security
Standards (restricted), ResourceQuota, LimitRange, default-deny
NetworkPolicy, and scoped allow policies for ingress and warehouse egress.
infra/k8s/observability/ ships a ServiceMonitor and PrometheusRule
set with SLO-grade alerts (error rate, p99 latency, circuit breaker,
audit-write stall).
infra/terraform/ provisions Cloud Run v2 with Secret Manager, Workload
Identity, VPC connector, uptime checks, and log-based alerting. See
infra/terraform/README.md for the variable and output tables.
Five step-by-step runbooks live at docs/runbooks/:
| Runbook | Use when |
|---|---|
production-deploy.md |
First production rollout (60 min) |
airgap-deploy.md |
Deploying to a restricted / offline cluster (3 hr) |
incident-response.md |
On-call response to alerts, SEV levels, AegisOps handoff |
schema-evolution.md |
Evolving the governed schema without breaking queries |
model-swap.md |
Swapping phi3 to llama3 to Claude without downtime |
Narrative case studies written as SE-style handouts. Composites drawn from common patterns; numbers illustrative.
| Story | Vertical | Deployment shape | Headline outcome |
|---|---|---|---|
| Acme Finance | Financial services | GKE + Snowflake, SSO, Prometheus | Dashboard lead time 3 days -> 6 hours (-94%), 85% of queries self-served |
| Northstar Health | Regulated healthcare | Airgapped AKS + Databricks, HIPAA BAA | Bespoke analytics turnaround 9 days -> 1 day (-89%), zero PHI incidents across 180 days |
Each story walks through the problem, the 90- or 180-day rollout, the policy-tuning decisions, and the specific lessons that fed back into the product. Designed for SE discovery + pre-sales handoff.
benchmarks/ contains reproducible performance and accuracy tests.
benchmarks/load_test.js- k6 script, 50 VUs x 5 min with governed question mix, thresholds gated on p95/p99 latency and checks pass rate.benchmarks/agent_accuracy_eval.py- runs the gold eval suite N times and renders a per-case accuracy chart.benchmarks/sample_results.json- realistic run output.
| Metric | Value |
|---|---|
| VUs / duration | 50 / 5 min |
| Total requests | 12,847 |
| Failed request rate | 0.3% |
http_req_duration p95 |
1,390 ms |
http_req_duration p99 |
1,842 ms |
| Policy allow / review / deny | 82.6% / 13.1% / 4.3% |
| Gold eval score (mean of 5 runs, Claude Sonnet 4) | 0.91 |
| Gold eval score stdev | 0.022 |
| SLO compliance | pass |
See benchmarks/README.md for the full usage and CI integration recipe.
Nexus-Hive/
main.py # FastAPI entrypoint (thin, delegates to modules)
config.py # Centralized config, constants, metric definitions
graph/
nodes.py # LangGraph agent nodes + graph builder
policy/
engine.py # Policy evaluation, query tags, heuristic inference
audit.py # Audit trail writer
governance.py # Governance surfaces (scorecard, warehouse brief)
routes/
ask.py # /api/ask + /api/stream (SSE agent trace)
health_meta.py # /health, /api/meta
warehouse.py # Warehouse adapter endpoints
auth.py # Operator auth session endpoints
services/
build_helpers.py # Runtime brief, meta builders
streaming.py # SSE streaming for agent trace
openai_helpers.py # Optional OpenAI integration
warehouse_adapter.py # Adapter base class + SQLite + registry
snowflake_adapter.py # Live Snowflake adapter
databricks_adapter.py # Live Databricks adapter
circuit_breaker.py # Ollama circuit breaker
security.py # HMAC sessions, token auth, RBAC
seed_db.py # 10k-row synthetic dataset generator
tests/ # 8 test modules, 80+ test cases
infra/
k8s/ # Kubernetes manifests
terraform/ # GCP Cloud Run Terraform configs
frontend/ # Chart.js + agent trace viewer
docs/
adr/ # Architecture Decision Records
solution-architecture.md
discovery-guide.md
| Project | Relationship |
|---|---|
| lakehouse-contract-lab | Data pipeline that feeds Nexus-Hive's warehouse |
| enterprise-llm-adoption-kit | Enterprise LLM governance patterns |
| agent-orchestration-benchmark | Comparative benchmarks for multi-agent orchestration frameworks (LangGraph, CrewAI, custom) |
| llm-onprem-deployment-kit | Reference deployment kit for airgapped / on-prem LLM serving, reused by Nexus-Hive's airgap overlay |
MIT