Skip to content

chrishayescodes/biblesearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

KJV Bible Search with RAG

A semantic Bible search system using Retrieval-Augmented Generation (RAG) with Ollama for intelligent, meaning-based verse discovery.

Features

  • πŸ” Hybrid Search - Combines semantic meaning + keyword matching for best accuracy
  • πŸ“– Full KJV Bible (31,102 verses) with Strong's numbers
  • ⭐ Popularity Boost - Famous verses (John 3:16, Psalm 23:1) rank higher
  • πŸ”„ Query Expansion - Handles archaic language ("subtil" β†’ "subtle")
  • πŸ‡¬πŸ‡· Strong's Greek Lexicon (5,624 entries) with Unicode text
  • πŸ“š Webster's 1913 Dictionary for archaic KJV English
  • ⚑ Smart Caching - Fast startup after first run
  • 🏠 100% Local - No cloud APIs, runs entirely on your machine

Quick Start

1. Setup (First Time)

# Clone or navigate to directory
cd biblesearch

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Make sure Ollama is running
ollama serve
ollama pull llama3.2:1b

2. Run a Demo

source venv/bin/activate

# Quick demo (instant)
python demos/quick_demo.py

# Caching demo
python demos/demo_caching.py

# Webster's 1913 dictionary demo
python demos/demo_webster.py

3. Full System Test

# WARNING: First run generates embeddings (60-90 minutes)
# Subsequent runs are instant (uses cache)
python tests/test_rag.py

Learning RAG

New to RAG, embeddings, or vector search? Start here:

# Read the guide (5 min)
cat docs/_START_HERE.md

# Try hands-on tutorials (30 min)
python docs/tutorials/example1_embeddings.py
python docs/tutorials/example2_tiny_rag.py
python docs/tutorials/example3_similarity_explained.py

# Deep dive (60 min)
cat docs/LEARNING_RAG.md
cat docs/RAG_VISUAL_GUIDE.md

Project Structure

biblesearch/
β”œβ”€β”€ README.md                 # You are here
β”œβ”€β”€ requirements.txt          # Python dependencies
β”‚
β”œβ”€β”€ docs/                     # Documentation & tutorials
β”‚   β”œβ”€β”€ _START_HERE.md        # πŸ‘ˆ Begin here!
β”‚   β”œβ”€β”€ LEARNING_RAG.md       # Deep dive tutorial
β”‚   β”œβ”€β”€ RAG_VISUAL_GUIDE.md   # Visual diagrams
β”‚   β”œβ”€β”€ CACHING_EXPLAINED.md  # Caching system
β”‚   β”œβ”€β”€ WEBSTER_INTEGRATION.md # Dictionary usage
β”‚   └── tutorials/            # Hands-on RAG learning
β”‚       β”œβ”€β”€ example1_embeddings.py
β”‚       β”œβ”€β”€ example2_tiny_rag.py
β”‚       └── example3_similarity_explained.py
β”‚
β”œβ”€β”€ demos/                    # Quick demonstrations
β”‚   β”œβ”€β”€ quick_demo.py         # Fast demo (2 min)
β”‚   β”œβ”€β”€ demo_caching.py       # Caching demo
β”‚   └── demo_webster.py       # Webster's dictionary
β”‚
β”œβ”€β”€ tests/                    # Full system tests
β”‚   β”œβ”€β”€ test_rag.py          # Complete RAG test
β”‚   └── test_search.py       # Simple search test
β”‚
β”œβ”€β”€ src/                      # Core source code
β”‚   β”œβ”€β”€ kjv_parser.py        # Parse KJV XML
β”‚   β”œβ”€β”€ strongs_parser.py    # Parse Greek lexicon
β”‚   β”œβ”€β”€ webster_parser.py    # Parse dictionary
β”‚   β”œβ”€β”€ rag_search.py        # RAG system
β”‚   β”œβ”€β”€ ollama_client.py     # LLM integration
β”‚   β”œβ”€β”€ cache_manager.py     # Caching system
β”‚   └── main.py              # CLI interface
β”‚
└── data/                     # Bible data files
    β”œβ”€β”€ kjvfull.xml          # KJV with Strong's (24MB)
    β”œβ”€β”€ strongsgreek.xml     # Greek lexicon (2.3MB)
    β”œβ”€β”€ dictionary.txt       # Webster's 1913 (3MB)
    └── cache/               # Generated caches

How It Works

The Three Key Artifacts

  1. Structured Data (cached: ~1 MB)

    • Parsed KJV verses with Strong's numbers
    • Parsed Greek lexicon entries
    • Parsed Webster's 1913 definitions
  2. Vector Embeddings (cached: ~500 MB)

    • Each verse converted to 2048 numbers
    • Captures semantic meaning
    • Generated once, cached forever
  3. Similarity Scores (computed per search)

    • Cosine similarity between query and verses
    • Ranks results by relevance
    • Instant calculation

Search Flow

Your Query: "trusting God in difficult times"
     ↓
Convert to embedding (2 seconds)
     ↓
Compare to 31,102 verse embeddings (1 second)
     ↓
Return top matches ranked by meaning
     ↓
Results: Proverbs 3:5-6, Psalm 37:5, Isaiah 40:31...

Example Searches

Semantic search finds verses by MEANING, not just keywords:

Query Traditional Search RAG Semantic Search
"trusting God" 12 verses with "trust" 200+ verses about faith, reliance, confidence
"God's love" 45 verses with "love" 500+ verses about agape, mercy, compassion
"salvation by faith" 15 exact matches 150+ verses about justification, grace

Performance

First Run (One Time)

  • Parse XML: 2-3 minutes
  • Generate embeddings: 60-90 minutes
  • Total: ~95 minutes

Every Run After

  • Load caches: 1 second
  • Search: 3-4 seconds
  • Total: ~4 seconds ⚑

Documentation

Getting Started

Search Features

Benchmarks & Development

Requirements

  • Python 3.8+
  • Ollama (local LLM)
  • 1 GB RAM (minimum)
  • 1 GB disk space (for caches)

Troubleshooting

"Connection refused":

# Make sure Ollama is running
ollama serve

"Module not found":

source venv/bin/activate
pip install -r requirements.txt

"Embeddings taking forever":

  • First run takes 60-90 minutes (normal!)
  • Test with fewer verses first: edit tests/test_rag.py
  • Or try demos which don't need embeddings

Resources

License

  • KJV Bible: Public domain
  • Strong's Concordance: Public domain (1890)
  • Webster's Dictionary: Public domain (1913)
  • Code: Educational use

Getting Started

  1. Quick look: python demos/quick_demo.py
  2. Learn concepts: cat docs/_START_HERE.md
  3. Try tutorials: python docs/tutorials/example1_embeddings.py
  4. Full system: python tests/test_rag.py (be patient on first run!)

Built with Claude Code for biblical study and RAG education πŸ“–πŸ€–

About

KJV Bible search with RAG - semantic search using Ollama embeddings with hybrid search, query expansion, and popularity boosting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages