KG-3: llm-graph-builder Evaluation

Evaluation of Neo4j Labs' llm-graph-builder web application for building knowledge graphs from meeting transcripts. The full web app had issues, so we pivoted to using its core extraction logic (LLMGraphTransformer) directly.

What's Here

app/                         # Cloned llm-graph-builder repo (neo4j-labs/llm-graph-builder)
  backend/                   # FastAPI backend (Python)
  frontend/                  # React/Vite frontend (TypeScript)
  docker-compose.yml         # Original Docker setup
extract_direct.py            # Direct extraction script using LLMGraphTransformer + Gemini
prepare_uploads.py           # Script to prepare meeting transcript files for upload
upload-files/                # (gitignored) 50 prepared meeting transcript files
EVAL-NOTES.md                # Detailed evaluation notes

Graph Stats (~15 meetings)

1,539 entity nodes + 51 document nodes
3,956 relationships across 23 types
Rich entity variety: 506 Topics, 162 ActionItems, 142 Persons, 83 Projects, 80 Metrics, 50 Meetings, 44 Decisions, 40 Clients, 36 Campaigns
Used Gemini 2.5 Flash via langchain-google-genai

Prerequisites

Python 3.11+
Neo4j instance running (tested with Neo4j 5.x)
Data export at /workspace/kg_export/ (or update paths in scripts)
Gemini API key (for extraction via langchain-google-genai)
Node.js 18+ (for frontend, optional)

Setup from Clone

Option A: Direct Extraction (recommended)

This uses the same LLMGraphTransformer that powers the web app, but without the heavy web app dependencies.

cd /workspace/kg-3

# 1. Create lightweight virtual environment
python3 -m venv venv-light
source venv-light/bin/activate

# 2. Install lightweight dependencies
pip install langchain-experimental langchain-google-genai langchain-neo4j neo4j python-dotenv

# 3. Create .env
cat > .env << 'EOF'
NEO4J_URI=bolt://localhost:7690
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
GEMINI_API_KEY=your-gemini-key
OPENAI_API_KEY=your-openai-key  # optional fallback
EOF

# 4. Prepare meeting transcript files
python3 prepare_uploads.py
# Creates 50 files in upload-files/

# 5. Run extraction
python3 extract_direct.py
# Processes files and loads entities/relationships into Neo4j
# ~200 seconds per file with Gemini 2.5 Flash

Option B: Full Web App

cd /workspace/kg-3/app

# Backend
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt -c constraints.txt
# Create backend/.env (see EVAL-NOTES.md for format)
uvicorn score:app --reload --port 8010
# NOTE: Takes ~5 min to start, 700MB+ RAM

# Frontend (separate terminal)
cd ../frontend
npm install
# Create frontend/.env with VITE_BACKEND_API_URL=http://localhost:8010
npm run dev -- --port 3010
# Access at http://localhost:3010

# NOTE: Backend extraction API fails for .txt files due to chunking bug.
# The web UI works for file upload but extraction triggers the same error.
# Use Option A (direct extraction) for reliable results.

How It Was Tested

Web app setup: Cloned repo, installed backend (249 packages) and frontend (565 packages). Backend starts but takes 5 min.
Backend API: File upload via POST /upload works. Extraction via POST /extract fails with '<=' not supported between instances of 'NoneType' and 'int' (chunking bug for .txt files).
Direct extraction: extract_direct.py successfully processed ~15 meeting files using Gemini 2.5 Flash via langchain-google-genai. Entities and relationships correctly extracted and loaded into Neo4j.
Neo4j verification: Verified node/relationship counts and entity quality via Cypher queries.

How to Test

# After extraction, verify graph:
python3 -c "
from neo4j import GraphDatabase
d = GraphDatabase.driver('bolt://localhost:7690', auth=('neo4j','your-password'))
with d.session() as s:
    r = s.run('MATCH (n) RETURN labels(n)[0] as label, count(*) as cnt ORDER BY cnt DESC')
    for rec in r: print(f'{rec[\"label\"]}: {rec[\"cnt\"]}')
    print()
    r = s.run('MATCH ()-[r]->() RETURN type(r) as type, count(*) as cnt ORDER BY cnt DESC LIMIT 10')
    for rec in r: print(f'{rec[\"type\"]}: {rec[\"cnt\"]}')
d.close()
"

# Sample entity query:
python3 -c "
from neo4j import GraphDatabase
d = GraphDatabase.driver('bolt://localhost:7690', auth=('neo4j','your-password'))
with d.session() as s:
    r = s.run('MATCH (p:Person) RETURN p.id LIMIT 10')
    print('Sample persons:')
    for rec in r: print(f'  {rec[0]}')
d.close()
"

Gitignored Files (need recreation)

venv-light/ - Lightweight Python venv (python3 -m venv venv-light && pip install langchain-experimental langchain-google-genai langchain-neo4j neo4j python-dotenv)
app/backend/venv/ - Full backend venv (pip install -r app/backend/requirements.txt -c app/backend/constraints.txt)
app/frontend/node_modules/ - Frontend deps (cd app/frontend && npm install)
upload-files/ - Generated by prepare_uploads.py (run it to recreate)
.env - Create manually with Neo4j and API credentials

Key Findings

The full web app is over-engineered for programmatic bulk ingestion
The core value is LLMGraphTransformer from langchain-experimental, which can be used as a library
Gemini requires langchain-google-genai (API key), NOT the built-in VertexAI integration (needs GCP service account)
Extraction is slow (~200s per file vs 23s for Setup B's SimpleKGPipeline)
Dependencies are heavy (249 packages, 2-3GB for full app)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude/worktrees		.claude/worktrees
app		app
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
EVAL-NOTES.md		EVAL-NOTES.md
README.md		README.md
TEST-QUERY-RESULTS.md		TEST-QUERY-RESULTS.md
TEST-RESULTS.md		TEST-RESULTS.md
extract_direct.py		extract_direct.py
prepare_uploads.py		prepare_uploads.py
processed_ids.json		processed_ids.json
query_graph.py		query_graph.py
run-queries.sh		run-queries.sh
run_parallel_extract.sh		run_parallel_extract.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG-3: llm-graph-builder Evaluation

What's Here

Graph Stats (~15 meetings)

Prerequisites

Setup from Clone

Option A: Direct Extraction (recommended)

Option B: Full Web App

How It Was Tested

How to Test

Gitignored Files (need recreation)

Key Findings

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KG-3: llm-graph-builder Evaluation

What's Here

Graph Stats (~15 meetings)

Prerequisites

Setup from Clone

Option A: Direct Extraction (recommended)

Option B: Full Web App

How It Was Tested

How to Test

Gitignored Files (need recreation)

Key Findings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages