Schemata

A fork of GitDiagram that visualizes any GitHub repository as an interactive, explorable diagram. Unlike the original — which uses 3 LLM calls per diagram and generates flat, non-expandable Mermaid charts — Schemata uses a hybrid approach that minimizes LLM usage and recursively pre-generates every layer of the directory tree in the background.

Watch on YouTube

How Schemata Differs from GitDiagram

	GitDiagram	Schemata
LLM calls per diagram	3 (explanation + mapping + Mermaid generation)	1 (edge generation only — nodes are deterministic)
Diagram format	Mermaid.js (text-based, requires syntax validation + fix loop of up to 3 extra LLM calls)	React Flow (JSON nodes/edges, no syntax issues)
Drill-down	None — flat single-level diagram	Recursive — click any directory to explore deeper, all the way down
Background prefetch	None	As soon as the current diagram loads, all child directories are pre-generated in the background
Node generation	LLM generates everything	Deterministic file tree analysis — nodes appear instantly, LLM only adds edges

Features

Instant Nodes, Streaming Edges: Nodes appear immediately from deterministic file tree analysis. A single LLM call then streams in the relationships between them — no waiting for 3 sequential LLM stages.
Recursive Background Prefetch: The moment the current diagram renders, a background request (/api/generate/prefetch) pre-generates sub-diagrams for every child directory one level down. When you click a directory node, its diagram is already cached and loads instantly. This continues recursively — each sub-diagram triggers prefetch for its children — until the entire directory tree has been explored down to leaf directories.
Clickable Nodes: Click on components to navigate directly to source files and directories on GitHub
Drill-Down: Click directory nodes to generate deeper sub-diagrams of that module's internal structure
Private Repos: Provide a GitHub PAT to diagram private repositories
Caching: Generated diagrams are cached in Supabase PostgreSQL so repeat visits load instantly
Multi-Provider AI: Swap between OpenAI, Google Gemini, and Anthropic with a single env var

Tech Stack

Frontend: Next.js 16, TypeScript, Tailwind CSS, ShadCN UI
Diagram Rendering: React Flow (@xyflow/react) with Dagre layout
AI: Vercel AI SDK (ai, @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google) for unified multi-provider LLM streaming
AI Gateway: Optional Vercel AI Gateway support — a single API key routes to any provider via https://ai-gateway.vercel.sh/v1 (enable with USE_AI_GATEWAY=true)
Database: Supabase PostgreSQL with Drizzle ORM
Backend: FastAPI (Python) on Railway, with Next.js Route Handlers as fallback
GitHub API: File tree + README fetching with PAT authentication

Local Development

Clone the repository

git clone [email protected]:LilySu/Schemata.git
cd Schemata

Install dependencies

pnpm install

Set up environment variables

cp .env.example .env

Edit .env with your credentials (see .env.example for the full list):

# Required — Supabase
NEXT_PUBLIC_SUPABASE_URL="https://your-project-ref.supabase.co"
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_DEFAULT_KEY="your-anon-key"
DATABASE_URL="postgresql://postgres:[password]@db.[project-ref].supabase.co:5432/postgres"

# Required — AI provider (at least one)
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GOOGLE_GENERATIVE_AI_API_KEY=...

# Model selection (format: provider/model)
AI_MODEL=google/gemini-2.5-flash-lite
# AI_FALLBACK_MODELS=google/gemini-2.5-flash

# Optional — Vercel AI Gateway (replaces individual provider keys)
# USE_AI_GATEWAY=true
# AI_GATEWAY_API_KEY=

# Optional — GitHub PAT (increases rate limit from 60/hr to 5,000/hr)
GITHUB_PAT=ghp_...

# Optional — GitHub App auth (alternative to PAT)
# GITHUB_CLIENT_ID=
# GITHUB_PRIVATE_KEY=
# GITHUB_INSTALLATION_ID=

# Optional — route frontend to external FastAPI backend
NEXT_PUBLIC_USE_LEGACY_BACKEND=false
NEXT_PUBLIC_API_DEV_URL=http://localhost:8000

# Optional — backend config
# API_ANALYTICS_KEY=
# CORS_ORIGINS=http://localhost:3000
# ENVIRONMENT=development
# PORT=8000

Initialize the database

pnpm db:push

You can browse the database with pnpm db:studio.

Run the app

pnpm dev

Open http://localhost:3000.

Optional: Run FastAPI backend

For production parity, run the FastAPI backend via Docker:

docker-compose up --build -d
docker-compose logs -f api

Then set in .env:

NEXT_PUBLIC_USE_LEGACY_BACKEND=true
NEXT_PUBLIC_API_DEV_URL=http://localhost:8000

Validation

pnpm check        # TypeScript + ESLint
pnpm test          # Frontend tests (Vitest)
pnpm build         # Production build
cd backend && uv run pytest -q  # Backend tests

Architecture

Hybrid Generation Pipeline

Deterministic Analysis (instant, zero LLM calls) — src/server/generate/analyzer.ts parses the file tree to extract nodes, groups, and tech stack. Nodes appear on the diagram immediately.
LLM Edge Generation (single call) — One LLM call streams relationships between the pre-computed nodes and identifies up to 5 missing components. Edges appear progressively via the Vercel AI SDK's streamText() API.

This is fundamentally different from GitDiagram's 3-stage pipeline (explanation → mapping → Mermaid generation), which requires 3 sequential LLM calls minimum and up to 6 if the Mermaid syntax validation fails. Schemata needs exactly 1 LLM call per diagram level.

Recursive Background Prefetch

The key innovation is aggressive background pre-generation of the entire directory tree:

User visits /:username/:repo → top-level diagram generates (1 LLM call)
The moment the diagram renders, the frontend fires a background request to /api/generate/prefetch
The prefetch endpoint identifies all directory nodes in the current diagram and generates a sub-diagram for each one (1 LLM call per directory, all in parallel)
Each of those sub-diagrams, once cached, triggers prefetch for its child directories
This continues recursively until every directory in the repository has a cached diagram — down to leaf directories with no subdirectories

The result: the first diagram takes a few seconds. Every subsequent drill-down click is instant because the diagram was already generated and cached in the background. For a typical repo, the entire tree is fully pre-cached within 30-60 seconds of the initial page load.

All diagrams are cached in Supabase — gitdiagram_diagram_cache for the root level, gitdiagram_sub_diagram_cache for every subdirectory. Once cached, the LLM is never called again for that repo.

Supabase Database Schema

The app uses Supabase as its PostgreSQL host. Drizzle ORM manages the schema with a gitdiagram_ table prefix. Connection is established via DATABASE_URL or derived from NEXT_PUBLIC_SUPABASE_URL (see src/server/db/index.ts).

`gitdiagram_diagram_cache`

Caches generated diagrams at the repository level. Primary key: (username, repo).

Column	Type	Description
`username`	varchar(256)	GitHub username/org
`repo`	varchar(256)	Repository name
`diagram`	text	Generated diagram data (JSON)
`explanation`	text	LLM explanation of the architecture
`created_at`	timestamptz	First generation time
`updated_at`	timestamptz	Last regeneration time
`used_own_key`	boolean	Whether the user provided their own API key

`gitdiagram_sub_diagram_cache`

Caches drill-down sub-diagrams for specific directories or files. Primary key: (username, repo, scope_path).

Column	Type	Description
`username`	varchar(256)	GitHub username/org
`repo`	varchar(256)	Repository name
`scope_path`	varchar(1024)	Directory or file path being visualized
`diagram`	text	Generated sub-diagram data (JSON)
`explanation`	text	LLM explanation of the module
`is_leaf`	boolean	Whether this scope has no deeper drill-down
`created_at`	timestamptz	First generation time
`updated_at`	timestamptz	Last regeneration time

Schema definition: src/server/db/schema.ts

Vercel AI SDK & AI Gateway

The AI layer (src/server/generate/provider.ts) supports two modes:

Direct Provider Mode (default) — Set AI_MODEL in provider/model format:

Provider prefix	SDK package	API key env var
`openai/`	`@ai-sdk/openai`	`OPENAI_API_KEY`
`anthropic/`	`@ai-sdk/anthropic`	`ANTHROPIC_API_KEY`
`google/`	`@ai-sdk/google`	`GOOGLE_GENERATIVE_AI_API_KEY`

AI Gateway Mode (optional) — Set USE_AI_GATEWAY=true and AI_GATEWAY_API_KEY to route all LLM calls through Vercel AI Gateway (https://ai-gateway.vercel.sh/v1). One API key for all providers, with built-in rate limiting and observability.

Model selection is configured in src/server/generate/model-config.ts:

AI_MODEL — primary model (default: google/gemini-2.5-flash-lite)
AI_FALLBACK_MODELS — comma-separated fallback models tried in order if the primary fails

Key Files

Concern	Frontend	Backend
Deterministic analyzer	`src/server/generate/analyzer.ts`	—
AI provider resolver	`src/server/generate/provider.ts`	`backend/app/services/openai_service.py`
Model config	`src/server/generate/model-config.ts`	`backend/app/services/model_config.py`
Prompts	`src/server/generate/prompts.ts`	`backend/app/prompts.py`
GitHub client	`src/server/generate/github.ts`	`backend/app/services/github_service.py`
Stream endpoint	`src/app/api/generate/stream/route.ts`	`backend/app/routers/generate.py`
Prefetch endpoint	`src/app/api/generate/prefetch/route.ts`	—
DB connection	`src/server/db/index.ts`	—
DB schema	`src/server/db/schema.ts`	—
Diagram component	`src/components/flow-diagram.tsx`	—
Main hook	`src/hooks/useDiagram.ts`	—

Attribution

Forked from GitDiagram by Ahmed Khaleel. Original project inspiration from Gitingest by Romain Courtois.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github		.github
backend		backend
docs		docs
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
drizzle.config.ts		drizzle.config.ts
eslint.config.mjs		eslint.config.mjs
next.config.js		next.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
prettier.config.js		prettier.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
vitest.setup.ts		vitest.setup.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Schemata

How Schemata Differs from GitDiagram

Features

Tech Stack

Local Development

Optional: Run FastAPI backend

Validation

Architecture

Hybrid Generation Pipeline

Recursive Background Prefetch

Supabase Database Schema

`gitdiagram_diagram_cache`

`gitdiagram_sub_diagram_cache`

Vercel AI SDK & AI Gateway

Key Files

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Schemata

How Schemata Differs from GitDiagram

Features

Tech Stack

Local Development

Optional: Run FastAPI backend

Validation

Architecture

Hybrid Generation Pipeline

Recursive Background Prefetch

Supabase Database Schema

gitdiagram_diagram_cache

gitdiagram_sub_diagram_cache

Vercel AI SDK & AI Gateway

Key Files

Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`gitdiagram_diagram_cache`

`gitdiagram_sub_diagram_cache`

Packages