Evaluation of Neo4j Labs' llm-graph-builder web application for building knowledge graphs from meeting transcripts. The full web app had issues, so we pivoted to using its core extraction logic (LLMGraphTransformer) directly.
app/ # Cloned llm-graph-builder repo (neo4j-labs/llm-graph-builder)
backend/ # FastAPI backend (Python)
frontend/ # React/Vite frontend (TypeScript)
docker-compose.yml # Original Docker setup
extract_direct.py # Direct extraction script using LLMGraphTransformer + Gemini
prepare_uploads.py # Script to prepare meeting transcript files for upload
upload-files/ # (gitignored) 50 prepared meeting transcript files
EVAL-NOTES.md # Detailed evaluation notes
- 1,539 entity nodes + 51 document nodes
- 3,956 relationships across 23 types
- Rich entity variety: 506 Topics, 162 ActionItems, 142 Persons, 83 Projects, 80 Metrics, 50 Meetings, 44 Decisions, 40 Clients, 36 Campaigns
- Used Gemini 2.5 Flash via
langchain-google-genai
- Python 3.11+
- Neo4j instance running (tested with Neo4j 5.x)
- Data export at
/workspace/kg_export/(or update paths in scripts) - Gemini API key (for extraction via langchain-google-genai)
- Node.js 18+ (for frontend, optional)
This uses the same LLMGraphTransformer that powers the web app, but without the heavy web app dependencies.
cd /workspace/kg-3
# 1. Create lightweight virtual environment
python3 -m venv venv-light
source venv-light/bin/activate
# 2. Install lightweight dependencies
pip install langchain-experimental langchain-google-genai langchain-neo4j neo4j python-dotenv
# 3. Create .env
cat > .env << 'EOF'
NEO4J_URI=bolt://localhost:7690
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
GEMINI_API_KEY=your-gemini-key
OPENAI_API_KEY=your-openai-key # optional fallback
EOF
# 4. Prepare meeting transcript files
python3 prepare_uploads.py
# Creates 50 files in upload-files/
# 5. Run extraction
python3 extract_direct.py
# Processes files and loads entities/relationships into Neo4j
# ~200 seconds per file with Gemini 2.5 Flashcd /workspace/kg-3/app
# Backend
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt -c constraints.txt
# Create backend/.env (see EVAL-NOTES.md for format)
uvicorn score:app --reload --port 8010
# NOTE: Takes ~5 min to start, 700MB+ RAM
# Frontend (separate terminal)
cd ../frontend
npm install
# Create frontend/.env with VITE_BACKEND_API_URL=http://localhost:8010
npm run dev -- --port 3010
# Access at http://localhost:3010
# NOTE: Backend extraction API fails for .txt files due to chunking bug.
# The web UI works for file upload but extraction triggers the same error.
# Use Option A (direct extraction) for reliable results.- Web app setup: Cloned repo, installed backend (249 packages) and frontend (565 packages). Backend starts but takes 5 min.
- Backend API: File upload via POST /upload works. Extraction via POST /extract fails with
'<=' not supported between instances of 'NoneType' and 'int'(chunking bug for .txt files). - Direct extraction:
extract_direct.pysuccessfully processed ~15 meeting files using Gemini 2.5 Flash vialangchain-google-genai. Entities and relationships correctly extracted and loaded into Neo4j. - Neo4j verification: Verified node/relationship counts and entity quality via Cypher queries.
# After extraction, verify graph:
python3 -c "
from neo4j import GraphDatabase
d = GraphDatabase.driver('bolt://localhost:7690', auth=('neo4j','your-password'))
with d.session() as s:
r = s.run('MATCH (n) RETURN labels(n)[0] as label, count(*) as cnt ORDER BY cnt DESC')
for rec in r: print(f'{rec[\"label\"]}: {rec[\"cnt\"]}')
print()
r = s.run('MATCH ()-[r]->() RETURN type(r) as type, count(*) as cnt ORDER BY cnt DESC LIMIT 10')
for rec in r: print(f'{rec[\"type\"]}: {rec[\"cnt\"]}')
d.close()
"
# Sample entity query:
python3 -c "
from neo4j import GraphDatabase
d = GraphDatabase.driver('bolt://localhost:7690', auth=('neo4j','your-password'))
with d.session() as s:
r = s.run('MATCH (p:Person) RETURN p.id LIMIT 10')
print('Sample persons:')
for rec in r: print(f' {rec[0]}')
d.close()
"venv-light/- Lightweight Python venv (python3 -m venv venv-light && pip install langchain-experimental langchain-google-genai langchain-neo4j neo4j python-dotenv)app/backend/venv/- Full backend venv (pip install -r app/backend/requirements.txt -c app/backend/constraints.txt)app/frontend/node_modules/- Frontend deps (cd app/frontend && npm install)upload-files/- Generated byprepare_uploads.py(run it to recreate).env- Create manually with Neo4j and API credentials
- The full web app is over-engineered for programmatic bulk ingestion
- The core value is
LLMGraphTransformerfromlangchain-experimental, which can be used as a library - Gemini requires
langchain-google-genai(API key), NOT the built-in VertexAI integration (needs GCP service account) - Extraction is slow (~200s per file vs 23s for Setup B's SimpleKGPipeline)
- Dependencies are heavy (249 packages, 2-3GB for full app)