Skip to content

ThreadDao/gen-sql-agent

Repository files navigation

Zong Agent: Natural Language to ClickHouse SQL

Zong Agent is an intelligent database assistant that translates natural language questions into executable ClickHouse SQL queries. It features a Human-in-the-loop (HITL) architecture that allows it to ask clarifying questions when requirements are ambiguous, and allows you to review the generated SQL before execution.

🏗️ Architecture

The system uses a LangGraph-based workflow combined with Model Context Protocol (MCP) to safely interact with your ClickHouse database.

graph TD
    User([User]) --> |"Natural Language Query"| FastAPI

    subgraph Backend [FastAPI Server]
        FastAPI --> SessionMgr[Session Manager]
        SessionMgr <--> SQLite[(SQLite History)]

        subgraph AI [LangGraph Agent]
            Start((START)) --> fetch_schema
            fetch_schema --> converse

            converse --> |Need clarification| clarify
            clarify --> |Wait for User| Wait1((Interrupt))
            Wait1 --> |User answers| converse

            converse --> |SQL generated| review_sql
            review_sql --> |Wait for Review| Wait2((Interrupt))
            Wait2 --> |User approves/edits| execute_sql

            execute_sql --> |Success| End((END))

            fetch_schema -.-> |Error| handle_error
            converse -.-> |Error| handle_error
            execute_sql -.-> |Error| handle_error
            handle_error -.-> |Retry| fetch_schema
        end
    end

    subgraph Data Layer [MCP & ClickHouse]
        fetch_schema == MCP get_schema ==> CH_MCP[ClickHouse MCP Server]
        execute_sql == MCP execute_query ==> CH_MCP

        CH_MCP <--> |TCP/HTTP| ClickHouse[(ClickHouse DB)]
    end

    SessionMgr --> AI
Loading

Core Components

  1. FastAPI Backend (api/server.py): Manages agent sessions and provides HTTP endpoints for the frontend.
  2. LangGraph Agent (agent/graph.py): The brain of the operation. It orchestrates the flow from schema fetching, context resolution, LLM generation, human review pauses (interrupts), to execution.
  3. ClickHouse MCP Server (clickhouse_mcp_server.py): An isolated Model Context Protocol server that securely exposes list_tables and execute_query capabilities to the agent.
  4. SQLite Storage (api/db.py): Persists successfully executed queries for history and reuse.

🚀 Getting Started

Prerequisites

  • Python 3.11+
  • uv (recommended for dependency management)
  • A running ClickHouse instance

Environment Variables

You need to provide your LLM API credentials and ClickHouse connection details. You can export these directly in your .env file:

# LLM Configuration (Default is Google Gemini, but compatible with OpenAI interface)
export OPENAI_API_KEY="your-api-key"
export LLM_MODEL="google/gemini-3.1-pro-preview" # Or any model supported by your provider
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/" # Example for Gemini via OpenAI compat

# ClickHouse Configuration
export CLICKHOUSE_HOST="localhost"
export CLICKHOUSE_PORT="8123"
export CLICKHOUSE_USER="default"
export CLICKHOUSE_PASSWORD=""

Running the Application

Start the server using uv:

uv run main.py
# or
uv run uvicorn main:app --reload

The application will be available at http://localhost:8000.

💻 Usage Demo

  1. Open your browser to http://localhost:8000.
  2. Enter your natural language request, for example: "Show me the top 10 stocks by trading volume for yesterday."

Scenario A: Ambiguous Request

  • If you ask: "Show me the top stocks", the agent will pause and ask a clarifying question: "Do you mean by volume or price? And for which date?"
  • You reply: "By volume, for today."
  • The agent generates the SQL based on your clarification.

Scenario B: Generating & Reviewing SQL

  • The agent analyzes the database schema and current time context, then generates a SQL query.
  • Review Step: The system pauses and presents the SELECT query along with an explanation of what it does.
  • You can review the SQL, edit it if necessary, and approve it.

Scenario C: Execution

  • Once approved, the query is sent securely via the MCP server to ClickHouse.
  • The results are returned and displayed in the UI.
  • The successful query is saved to your local SQLite history database for future reference.

🔒 Security

  • MCP Isolation: The agent does not have direct DB credentials; it communicates via the MCP protocol.
  • Read-only Enforcement: Interaction with ClickHouse is deliberately constrained (execute_query only returns results, and the agent prompt is strictly configured).
  • Human-in-the-loop: No query executes against your database without explicit human approval.
  • No persistent env files: This application is configured to directly access and load the .env file. To prevent accidentally committing the .env file to GitHub, it is recommended that you do not place it within the project directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors