KJV Bible Search with RAG

A semantic Bible search system using Retrieval-Augmented Generation (RAG) with Ollama for intelligent, meaning-based verse discovery.

Features

🔍 Hybrid Search - Combines semantic meaning + keyword matching for best accuracy
📖 Full KJV Bible (31,102 verses) with Strong's numbers
⭐ Popularity Boost - Famous verses (John 3:16, Psalm 23:1) rank higher
🔄 Query Expansion - Handles archaic language ("subtil" → "subtle")
🇬🇷 Strong's Greek Lexicon (5,624 entries) with Unicode text
📚 Webster's 1913 Dictionary for archaic KJV English
⚡ Smart Caching - Fast startup after first run
🏠 100% Local - No cloud APIs, runs entirely on your machine

Quick Start

1. Setup (First Time)

# Clone or navigate to directory
cd biblesearch

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Make sure Ollama is running
ollama serve
ollama pull llama3.2:1b

2. Run a Demo

source venv/bin/activate

# Quick demo (instant)
python demos/quick_demo.py

# Caching demo
python demos/demo_caching.py

# Webster's 1913 dictionary demo
python demos/demo_webster.py

3. Full System Test

# WARNING: First run generates embeddings (60-90 minutes)
# Subsequent runs are instant (uses cache)
python tests/test_rag.py

Learning RAG

New to RAG, embeddings, or vector search? Start here:

# Read the guide (5 min)
cat docs/_START_HERE.md

# Try hands-on tutorials (30 min)
python docs/tutorials/example1_embeddings.py
python docs/tutorials/example2_tiny_rag.py
python docs/tutorials/example3_similarity_explained.py

# Deep dive (60 min)
cat docs/LEARNING_RAG.md
cat docs/RAG_VISUAL_GUIDE.md

Project Structure

biblesearch/
├── README.md                 # You are here
├── requirements.txt          # Python dependencies
│
├── docs/                     # Documentation & tutorials
│   ├── _START_HERE.md        # 👈 Begin here!
│   ├── LEARNING_RAG.md       # Deep dive tutorial
│   ├── RAG_VISUAL_GUIDE.md   # Visual diagrams
│   ├── CACHING_EXPLAINED.md  # Caching system
│   ├── WEBSTER_INTEGRATION.md # Dictionary usage
│   └── tutorials/            # Hands-on RAG learning
│       ├── example1_embeddings.py
│       ├── example2_tiny_rag.py
│       └── example3_similarity_explained.py
│
├── demos/                    # Quick demonstrations
│   ├── quick_demo.py         # Fast demo (2 min)
│   ├── demo_caching.py       # Caching demo
│   └── demo_webster.py       # Webster's dictionary
│
├── tests/                    # Full system tests
│   ├── test_rag.py          # Complete RAG test
│   └── test_search.py       # Simple search test
│
├── src/                      # Core source code
│   ├── kjv_parser.py        # Parse KJV XML
│   ├── strongs_parser.py    # Parse Greek lexicon
│   ├── webster_parser.py    # Parse dictionary
│   ├── rag_search.py        # RAG system
│   ├── ollama_client.py     # LLM integration
│   ├── cache_manager.py     # Caching system
│   └── main.py              # CLI interface
│
└── data/                     # Bible data files
    ├── kjvfull.xml          # KJV with Strong's (24MB)
    ├── strongsgreek.xml     # Greek lexicon (2.3MB)
    ├── dictionary.txt       # Webster's 1913 (3MB)
    └── cache/               # Generated caches

How It Works

The Three Key Artifacts

Structured Data (cached: ~1 MB)
- Parsed KJV verses with Strong's numbers
- Parsed Greek lexicon entries
- Parsed Webster's 1913 definitions
Vector Embeddings (cached: ~500 MB)
- Each verse converted to 2048 numbers
- Captures semantic meaning
- Generated once, cached forever
Similarity Scores (computed per search)
- Cosine similarity between query and verses
- Ranks results by relevance
- Instant calculation

Search Flow

Your Query: "trusting God in difficult times"
     ↓
Convert to embedding (2 seconds)
     ↓
Compare to 31,102 verse embeddings (1 second)
     ↓
Return top matches ranked by meaning
     ↓
Results: Proverbs 3:5-6, Psalm 37:5, Isaiah 40:31...

Example Searches

Semantic search finds verses by MEANING, not just keywords:

Query	Traditional Search	RAG Semantic Search
"trusting God"	12 verses with "trust"	200+ verses about faith, reliance, confidence
"God's love"	45 verses with "love"	500+ verses about agape, mercy, compassion
"salvation by faith"	15 exact matches	150+ verses about justification, grace

Performance

First Run (One Time)

Parse XML: 2-3 minutes
Generate embeddings: 60-90 minutes
Total: ~95 minutes

Every Run After

Load caches: 1 second
Search: 3-4 seconds
Total: ~4 seconds ⚡

Documentation

Getting Started

docs/_START_HERE.md - Your entry point
docs/LEARNING_RAG.md - Complete RAG tutorial
docs/RAG_VISUAL_GUIDE.md - Visual diagrams
docs/CACHING_EXPLAINED.md - Caching system details
docs/WEBSTER_INTEGRATION.md - Dictionary usage

Search Features

docs/search-features/HYBRID_SEARCH_GUIDE.md - Semantic + keyword search
docs/search-features/QUERY_EXPANSION_GUIDE.md - Archaic language handling
docs/search-features/POPULARITY_BOOST_GUIDE.md - Famous verse ranking
docs/search-features/TESTING_GUIDE.md - How to test the system

Benchmarks & Development

docs/benchmarks/ - Performance test results
docs/development/ - Build system and development notes

Requirements

Python 3.8+
Ollama (local LLM)
1 GB RAM (minimum)
1 GB disk space (for caches)

Troubleshooting

"Connection refused":

# Make sure Ollama is running
ollama serve

"Module not found":

source venv/bin/activate
pip install -r requirements.txt

"Embeddings taking forever":

First run takes 60-90 minutes (normal!)
Test with fewer verses first: edit tests/test_rag.py
Or try demos which don't need embeddings

Resources

Ollama - Local LLM server
Strong's Concordance - Greek/Hebrew reference
Webster's 1913 - Historical dictionary
OSIS Bible Format - Bible XML standard

License

KJV Bible: Public domain
Strong's Concordance: Public domain (1890)
Webster's Dictionary: Public domain (1913)
Code: Educational use

Getting Started

Quick look: python demos/quick_demo.py
Learn concepts: cat docs/_START_HERE.md
Try tutorials: python docs/tutorials/example1_embeddings.py
Full system: python tests/test_rag.py (be patient on first run!)

Built with Claude Code for biblical study and RAG education 📖🤖

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KJV Bible Search with RAG

Features

Quick Start

1. Setup (First Time)

2. Run a Demo

3. Full System Test

Learning RAG

Project Structure

How It Works

The Three Key Artifacts

Search Flow

Example Searches

Performance

First Run (One Time)

Every Run After

Documentation

Getting Started

Search Features

Benchmarks & Development

Requirements

Troubleshooting

Resources

License

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
benchmarks		benchmarks
data		data
demos		demos
docs		docs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
api.py		api.py
build.py		build.py
generate_embeddings.py		generate_embeddings.py
requirements-api.txt		requirements-api.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

KJV Bible Search with RAG

Features

Quick Start

1. Setup (First Time)

2. Run a Demo

3. Full System Test

Learning RAG

Project Structure

How It Works

The Three Key Artifacts

Search Flow

Example Searches

Performance

First Run (One Time)

Every Run After

Documentation

Getting Started

Search Features

Benchmarks & Development

Requirements

Troubleshooting

Resources

License

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages