GitHub - KnowledgeXLab/Heta: Multimodal knowledge base for AI agents. Turn documents of any kind into queryable knowledge, give agents persistent memory, and generate structured insights — all in one platform.

Agent-Oriented Knowledge Management Platform
Unified knowledge base, episodic memory, and generative synthesis.

What is Heta?

Heta is an all-in-one knowledge infrastructure for AI agents. It gives agents a place to store, retrieve, and accumulate knowledge across three complementary layers:

HetaDB — ingest documents (PDF, DOCX, PPTX, XLS/XLSX, images, …), extract knowledge graphs, and query with five retrieval strategies from naive vector search to multi-hop reasoning.
HetaMem — a dual-layer memory system: fast episodic recall (MemoryVG) for conversation facts, and a long-term knowledge graph (MemoryKB) that grows with the agent.
HetaGen — knowledge-base-driven structured content generation: table synthesis, tag-tree construction, and Text-to-SQL. (early stage)

Key Features

HetaDB

Ingests PDF, DOCX, PPTX, XLS/XLSX, images, HTML, Markdown, ZIP archives
LLM-powered knowledge graph extraction with deduplication (Union-Find merging)
Five query strategies: naive · rerank (BM25 + vector + cross-encoder) · rewriter · multihop (ReAct) · direct
Inline citations linking answers back to source documents

HetaMem

MemoryVG — LLM auto-extracts facts from conversations; instant semantic search; full CRUD + history audit
MemoryKB — LightRAG knowledge graph that grows as the agent learns; hybrid / local / global retrieval modes
Scope isolation per user_id / agent_id / run_id

HetaGen (early stage)

Generate structured tables by querying the knowledge base
Tag-tree construction from topics
Text-to-SQL over generated tables

HetaDB and HetaMem also expose optional MCP servers (ports 8012 / 8011) for direct integration with MCP-compatible clients such as Claude Desktop and Cursor.

Quick Start

Option A — Docker Compose (recommended)

Prerequisites: Docker ≥ 24.0 · Docker Compose ≥ 2.20 · DashScope and SiliconFlow API keys

git clone https://github.com/HetaTeam/Heta.git
cd Heta

# Chinese API providers (DashScope + SiliconFlow)
cp config.example.zh.yaml config.yaml

# International API providers (OpenAI + Gemini)
# cp config.example.yaml config.yaml

Open config.yaml and fill in your API keys:

providers:
  dashscope:
    api_key: "YOUR_DASHSCOPE_KEY"    # required

  siliconflow:
    api_key: "YOUR_SILICONFLOW_KEY"  # required

docker-compose up -d

First run pulls images and builds the stack (~10–20 min). Verify:

docker-compose ps           # all services: healthy
curl localhost:8000/health

URL	Description
http://localhost	Heta web UI
http://localhost:8000/docs	REST API (Swagger)
http://localhost:7474	Neo4j browser
http://localhost:9001	MinIO console

docker-compose down         # stop, keep data
docker-compose down -v      # stop and delete all volumes

Option B — Manual Setup

Prerequisites: Python 3.10 · PostgreSQL · Milvus · Neo4j

# 1. Install backend
conda create -n heta python=3.10 -y && conda activate heta
pip install -e .

# 2. Build frontend
cd heta-frontend && npm install && npm run build && cd ..

# 3. Run (unified — all modules on one port)
PYTHONPATH=src python src/main.py          # → http://localhost:8000

Run each module independently:

export PYTHONPATH=/path/to/Heta/src

python src/hetadb/api/main.py              # HetaDB   → :8001
python src/hetagen/api/main.py             # HetaGen  → :8002
python src/hetamem/api/main.py             # HetaMem  → :8003

# MCP servers
HETAMEM_BASE_URL=http://localhost:8000 python src/hetamem/mcp/server.py  # → :8011
HETADB_BASE_URL=http://localhost:8000  python src/hetadb/mcp/server.py   # → :8012

Port reference:

Service	Port
Heta unified API	8000
HetaDB (standalone)	8001
HetaGen (standalone)	8002
HetaMem (standalone)	8003
HetaMem MCP	8011
HetaDB MCP	8012
PostgreSQL	5432
Milvus	19530
Neo4j Browser / Bolt	7474 / 7687
MinIO S3 / Console	9000 / 9001

Connecting Agents to Heta

Heta exposes two integration layers — use one or both:

Via MCP — Tool Access

MCP gives your agent direct tool-call access to HetaDB and HetaMem. Add the following to your MCP client config (e.g. Claude Desktop ~/.claude.json):

{
  "mcpServers": {
    "hetamem": { "type": "http", "url": "http://localhost:8011/mcp/" },
    "hetadb":  { "type": "http", "url": "http://localhost:8012/mcp/" }
  }
}

The agent can now call HetaDB and HetaMem tools directly without any additional setup.

Via Skill — Workflow Guidance

The bundled skill teaches the agent when and how to use each layer — which system to query first, how to store findings, and the correct three-step retrieval order (MemoryVG → HetaDB → MemoryKB).

skills/querying-knowledge-and-memory/SKILL.md

Load it in your agent system!

Core Workflows

HetaDB — Build a Knowledge Base and Chat

Datasets and knowledge bases are created and managed through the Heta web UI (http://localhost):

Create a dataset and upload your documents (PDF, DOCX, HTML, images, …)
Create a knowledge base and link it to your dataset
Trigger parsing — Heta extracts a knowledge graph and embeddings (async)

Once indexed, agents query the knowledge base via the chat API:

# List available knowledge bases
curl http://localhost:8000/api/v1/hetadb/files/knowledge-bases

# Query a knowledge base
curl -X POST http://localhost:8000/api/v1/hetadb/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query":      "What are the main contributions?",
    "kb_id":      "research-kb",
    "user_id":    "agent",
    "query_mode": "rerank"
  }'
# → { "response": "...", "citations": [...] }

Available query_mode values: naive · rerank · rewriter · multihop · direct

MemoryVG — Episodic Memory

# Store facts extracted from a conversation
curl -X POST http://localhost:8000/api/v1/hetamem/vg/add \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user",      "content": "I prefer concise Python examples."},
      {"role": "assistant", "content": "Noted."}
    ],
    "agent_id": "agent"
  }'

# Recall relevant facts
curl -X POST http://localhost:8000/api/v1/hetamem/vg/search \
  -H "Content-Type: application/json" \
  -d '{"query": "user coding preferences", "agent_id": "agent"}'
# → { "results": [{"memory": "Prefers concise Python examples", "score": 0.91}] }

MemoryKB — Long-Term Knowledge Graph

# Insert knowledge into the agent's long-term graph (async — ~200s to index)
curl -X POST http://localhost:8000/api/v1/hetamem/kb/insert \
  -F "query=Transformer models use self-attention to process sequences in parallel."

# Query the knowledge graph
curl -X POST http://localhost:8000/api/v1/hetamem/kb/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do transformers handle long-range dependencies?", "mode": "hybrid"}'
# → { "final_answer": "..." }

Using the Querying Skill

The bundled querying skill encodes the recommended retrieval workflow — when to use each layer, in what order, and what to store back. Load it into any skill-aware agent system and it will orchestrate HetaDB, MemoryVG, and MemoryKB automatically.

Three-step retrieval order the skill enforces:

1. MemoryVG  — check fast personal memory first (~100 ms)
2. HetaDB    — deep retrieval from uploaded documents (1–3 s)
3. Store back — cache finding in MemoryVG, or insert into MemoryKB if worth accumulating

Layer selection guide:

Layer	Best for	Typical latency
MemoryVG	Facts already seen; cross-session cache	~100 ms
HetaDB	Deep retrieval from uploaded documents	1–3 s
MemoryKB	Agent's accumulating knowledge graph	~200 s to index · ~1 s to query

Project Structure

Heta/
├── config.example.yaml       # Config template (international: OpenAI / Gemini)
├── config.example.zh.yaml    # Config template (domestic: DashScope / SiliconFlow)
├── docker-compose.yml        # Full-stack deployment
├── Dockerfile
├── pyproject.toml
├── docs/                     # API reference and design documents
├── heta-frontend/            # Web UI
├── skills/                   # Agent skills
│   └── querying-knowledge-and-memory/
└── src/
    ├── main.py               # Unified entry point (port 8000)
    ├── common/               # Shared utilities: logging, config, LLM client, tasks
    ├── hetadb/               # Knowledge-base ingestion & multi-strategy chat
    ├── hetagen/              # Table and tag-tree generation
    └── hetamem/              # Agent memory: MemoryKB + MemoryVG + MCP server

Acknowledgments

Heta is built on the shoulders of excellent open-source projects:

MinerU — the document parsing engine powering HetaDB ingestion
mem0 — the episodic memory engine powering MemoryVG
LightRAG — the knowledge graph engine powering MemoryKB

We are grateful to their authors for making this work possible.

License

AGPL-3.0 — see LICENSE for details.

This project incorporates code from the following open-source projects:

MinerU — AGPL-3.0 License
LightRAG — MIT License
mem0 — Apache 2.0 License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Heta?

Key Features

Quick Start

Option A — Docker Compose (recommended)

Option B — Manual Setup

Connecting Agents to Heta

Via MCP — Tool Access

Via Skill — Workflow Guidance

Core Workflows

HetaDB — Build a Knowledge Base and Chat

MemoryVG — Episodic Memory

MemoryKB — Long-Term Knowledge Graph

Using the Querying Skill

Project Structure

Acknowledgments

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
docs		docs
heta-frontend		heta-frontend
skills/querying-knowledge-and-memory		skills/querying-knowledge-and-memory
src		src
tests/eval		tests/eval
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
config.example.zh.yaml		config.example.zh.yaml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-docs.txt		requirements-docs.txt

Folders and files

Latest commit

History

Repository files navigation

What is Heta?

Key Features

Quick Start

Option A — Docker Compose (recommended)

Option B — Manual Setup

Connecting Agents to Heta

Via MCP — Tool Access

Via Skill — Workflow Guidance

Core Workflows

HetaDB — Build a Knowledge Base and Chat

MemoryVG — Episodic Memory

MemoryKB — Long-Term Knowledge Graph

Using the Querying Skill

Project Structure

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages