A semantic Bible search system using Retrieval-Augmented Generation (RAG) with Ollama for intelligent, meaning-based verse discovery.
- π Hybrid Search - Combines semantic meaning + keyword matching for best accuracy
- π Full KJV Bible (31,102 verses) with Strong's numbers
- β Popularity Boost - Famous verses (John 3:16, Psalm 23:1) rank higher
- π Query Expansion - Handles archaic language ("subtil" β "subtle")
- π¬π· Strong's Greek Lexicon (5,624 entries) with Unicode text
- π Webster's 1913 Dictionary for archaic KJV English
- β‘ Smart Caching - Fast startup after first run
- π 100% Local - No cloud APIs, runs entirely on your machine
# Clone or navigate to directory
cd biblesearch
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Make sure Ollama is running
ollama serve
ollama pull llama3.2:1bsource venv/bin/activate
# Quick demo (instant)
python demos/quick_demo.py
# Caching demo
python demos/demo_caching.py
# Webster's 1913 dictionary demo
python demos/demo_webster.py# WARNING: First run generates embeddings (60-90 minutes)
# Subsequent runs are instant (uses cache)
python tests/test_rag.pyNew to RAG, embeddings, or vector search? Start here:
# Read the guide (5 min)
cat docs/_START_HERE.md
# Try hands-on tutorials (30 min)
python docs/tutorials/example1_embeddings.py
python docs/tutorials/example2_tiny_rag.py
python docs/tutorials/example3_similarity_explained.py
# Deep dive (60 min)
cat docs/LEARNING_RAG.md
cat docs/RAG_VISUAL_GUIDE.mdbiblesearch/
βββ README.md # You are here
βββ requirements.txt # Python dependencies
β
βββ docs/ # Documentation & tutorials
β βββ _START_HERE.md # π Begin here!
β βββ LEARNING_RAG.md # Deep dive tutorial
β βββ RAG_VISUAL_GUIDE.md # Visual diagrams
β βββ CACHING_EXPLAINED.md # Caching system
β βββ WEBSTER_INTEGRATION.md # Dictionary usage
β βββ tutorials/ # Hands-on RAG learning
β βββ example1_embeddings.py
β βββ example2_tiny_rag.py
β βββ example3_similarity_explained.py
β
βββ demos/ # Quick demonstrations
β βββ quick_demo.py # Fast demo (2 min)
β βββ demo_caching.py # Caching demo
β βββ demo_webster.py # Webster's dictionary
β
βββ tests/ # Full system tests
β βββ test_rag.py # Complete RAG test
β βββ test_search.py # Simple search test
β
βββ src/ # Core source code
β βββ kjv_parser.py # Parse KJV XML
β βββ strongs_parser.py # Parse Greek lexicon
β βββ webster_parser.py # Parse dictionary
β βββ rag_search.py # RAG system
β βββ ollama_client.py # LLM integration
β βββ cache_manager.py # Caching system
β βββ main.py # CLI interface
β
βββ data/ # Bible data files
βββ kjvfull.xml # KJV with Strong's (24MB)
βββ strongsgreek.xml # Greek lexicon (2.3MB)
βββ dictionary.txt # Webster's 1913 (3MB)
βββ cache/ # Generated caches
-
Structured Data (cached: ~1 MB)
- Parsed KJV verses with Strong's numbers
- Parsed Greek lexicon entries
- Parsed Webster's 1913 definitions
-
Vector Embeddings (cached: ~500 MB)
- Each verse converted to 2048 numbers
- Captures semantic meaning
- Generated once, cached forever
-
Similarity Scores (computed per search)
- Cosine similarity between query and verses
- Ranks results by relevance
- Instant calculation
Your Query: "trusting God in difficult times"
β
Convert to embedding (2 seconds)
β
Compare to 31,102 verse embeddings (1 second)
β
Return top matches ranked by meaning
β
Results: Proverbs 3:5-6, Psalm 37:5, Isaiah 40:31...
Semantic search finds verses by MEANING, not just keywords:
| Query | Traditional Search | RAG Semantic Search |
|---|---|---|
| "trusting God" | 12 verses with "trust" | 200+ verses about faith, reliance, confidence |
| "God's love" | 45 verses with "love" | 500+ verses about agape, mercy, compassion |
| "salvation by faith" | 15 exact matches | 150+ verses about justification, grace |
- Parse XML: 2-3 minutes
- Generate embeddings: 60-90 minutes
- Total: ~95 minutes
- Load caches: 1 second
- Search: 3-4 seconds
- Total: ~4 seconds β‘
- docs/_START_HERE.md - Your entry point
- docs/LEARNING_RAG.md - Complete RAG tutorial
- docs/RAG_VISUAL_GUIDE.md - Visual diagrams
- docs/CACHING_EXPLAINED.md - Caching system details
- docs/WEBSTER_INTEGRATION.md - Dictionary usage
- docs/search-features/HYBRID_SEARCH_GUIDE.md - Semantic + keyword search
- docs/search-features/QUERY_EXPANSION_GUIDE.md - Archaic language handling
- docs/search-features/POPULARITY_BOOST_GUIDE.md - Famous verse ranking
- docs/search-features/TESTING_GUIDE.md - How to test the system
- docs/benchmarks/ - Performance test results
- docs/development/ - Build system and development notes
- Python 3.8+
- Ollama (local LLM)
- 1 GB RAM (minimum)
- 1 GB disk space (for caches)
"Connection refused":
# Make sure Ollama is running
ollama serve"Module not found":
source venv/bin/activate
pip install -r requirements.txt"Embeddings taking forever":
- First run takes 60-90 minutes (normal!)
- Test with fewer verses first: edit
tests/test_rag.py - Or try demos which don't need embeddings
- Ollama - Local LLM server
- Strong's Concordance - Greek/Hebrew reference
- Webster's 1913 - Historical dictionary
- OSIS Bible Format - Bible XML standard
- KJV Bible: Public domain
- Strong's Concordance: Public domain (1890)
- Webster's Dictionary: Public domain (1913)
- Code: Educational use
- Quick look:
python demos/quick_demo.py - Learn concepts:
cat docs/_START_HERE.md - Try tutorials:
python docs/tutorials/example1_embeddings.py - Full system:
python tests/test_rag.py(be patient on first run!)
Built with Claude Code for biblical study and RAG education ππ€