The Reasoning Librarian

A vectorless RAG (Retrieval-Augmented Generation) proof of concept implementing the PageIndex architecture. Instead of embedding documents into vector space, this system uses hierarchical navigation and LLM reasoning to find relevant information.

If you find this project interesting, please leave a star! ⭐

🎯 What is This?

Traditional RAG systems chunk documents and store them as vector embeddings. This approach has fundamental limitations:

Destroys document structure and narrative flow
Returns "vibes" instead of precise matches
Fails at multi-hop reasoning

The Reasoning Librarian takes a different approach:

Preserves document hierarchy (Book → Chapter structure)
Uses LLM reasoning to navigate like a human researcher
No vector embeddings - just structured summaries and raw text

🏗️ Architecture

1. The Cartographer (Indexing)

The Cartographer builds a hierarchical mental map of the document using a bottom-up summarization approach.

graph TD
    A[Raw Text] --> B[Clean Text: Skip ToC/Header]
    B --> C[Regex Parsing: Books & Chapters]
    C --> D[Haiku 4.5: Chapter Summaries]
    D --> E[Sonnet 4.5: Book Aggregate Summaries]
    E --> F[Sonnet 4.5: Root Aggregate Summary]
    F --> G[(tree_index.json)]
    
    style D fill:#6366f1,stroke:#4338ca,stroke-width:2px,color:#fff
    style E fill:#6366f1,stroke:#4338ca,stroke-width:2px,color:#fff
    style F fill:#6366f1,stroke:#4338ca,stroke-width:2px,color:#fff

2. The Navigator (Retrieval)

The Navigator uses Claude's tool-use capabilities to traverse the hierarchical index like a human researcher.

sequenceDiagram
    participant User
    participant Agent as Agent (Sonnet 4.5)
    participant Index as Tree Index

    User->>Agent: "What are the duties of the sovereign?"
    rect rgba(128, 128, 128, 0.1)
    Note over Agent, Index: Agent Loop (Iterative Navigation)
    Agent->>Index: read_node(root)
    Index-->>Agent: Book summaries
    Agent->>Agent: Reasoning: "Book V covers state revenue/duties"
    Agent->>Index: read_node(BOOK_V)
    Index-->>Agent: Chapter summaries
    Agent->>Agent: Reasoning: "Chapter I discusses sovereign expenses"
    Agent->>Index: read_content(BOOK_V_CHAPTER_I)
    Index-->>Agent: Raw chapter text
    end
    Agent->>User: Synthesized Answer with Citations

🚀 Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set Up API Key

cp .env.example .env
# Edit .env and add your Anthropic API key

3. Download the Source Text

mkdir -p data
curl -o data/wealth_of_nations.txt https://www.gutenberg.org/files/3300/3300-0.txt

4. Build the Index

python -m src.indexer

This downloads "The Wealth of Nations" and builds a hierarchical index with LLM-generated summaries. Takes ~5 minutes and costs only ~$0.33 in API calls using the hybrid indexing strategy.

5. Query the System

python -m src.cli

Example queries:

"What are the three duties of the sovereign?"
"What is the division of labor?"
"What causes wages to rise?"

📁 Project Structure

page-index-poc/
├── src/
│   ├── models.py      # Pydantic data models
│   ├── indexer.py     # The Cartographer (builds index)
│   ├── navigator.py   # The Navigator (agentic retrieval)
│   └── cli.py         # Interactive CLI
├── data/
│   └── wealth_of_nations.txt   # Source text (Download required)
├── output/
│   └── tree_index.json         # Generated index
├── requirements.txt
└── .env.example

🔧 How It Works (Hybrid Strategy)

To achieve maximum performance at minimum cost, this PoC uses a hybrid model architecture:

1. Indexing (The Cartographer)

Parse Structure: Use regex to identify BOOK I, CHAPTER I, etc.
Chapter Retrieval (Haiku 4.5): We use the faster, cheaper Haiku 4.5 to generate summaries for the 32 individual chapters. This handles 90% of the indexing volume for just a few cents.
Hierarchical "Map" (Sonnet 4.5): The higher-level Book and Root summaries are generated using Sonnet 4.5. This ensures the agent has a high-quality "conceptual map" to navigate correctly.
Cost Efficiency: This hybrid approach allowed us to index the entire 900-page treatise for only ~$0.33.

2. Retrieval (The Navigator)

Reasoning Agent (Sonnet 4.5): The Navigator always uses Sonnet 4.5 to ensure robust tool-use, multi-hop reasoning, and high-quality synthesis of the final answer.
Top-Down Discovery:
1. Start at Root: Browse 5 Book summaries.
2. Reason About Path: "Market price vs Natural price? → Book I".
3. Drill Down: Read Book I's chapter summaries.
4. Execute: Read the full raw text of the relevant chapter.
5. Cite: Synthesize answer with specific Book/Chapter citations.

📊 Comparison with Vector RAG

Feature	Vector RAG	Reasoning RAG
Indexing	Embed chunks	Summarize hierarchy
Retrieval	Cosine similarity	LLM navigation
Latency	~50ms	~3-5 seconds
Accuracy	Low on complex queries	High
Cost per query	~$0.001	~$0.05
Best for	Simple fact lookup	Complex/structured docs

📚 Based On

This implementation is inspired by PageIndex by VectifyAI, which achieved 98.7% accuracy on FinanceBench using hierarchical document navigation.

📜 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
output		output
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Reasoning Librarian

If you find this project interesting, please leave a star! ⭐

🎯 What is This?

🏗️ Architecture

1. The Cartographer (Indexing)

2. The Navigator (Retrieval)

🚀 Quick Start

1. Install Dependencies

2. Set Up API Key

3. Download the Source Text

4. Build the Index

5. Query the System

📁 Project Structure

🔧 How It Works (Hybrid Strategy)

1. Indexing (The Cartographer)

2. Retrieval (The Navigator)

📊 Comparison with Vector RAG

📚 Based On

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Reasoning Librarian

If you find this project interesting, please leave a star! ⭐

🎯 What is This?

🏗️ Architecture

1. The Cartographer (Indexing)

2. The Navigator (Retrieval)

🚀 Quick Start

1. Install Dependencies

2. Set Up API Key

3. Download the Source Text

4. Build the Index

5. Query the System

📁 Project Structure

🔧 How It Works (Hybrid Strategy)

1. Indexing (The Cartographer)

2. Retrieval (The Navigator)

📊 Comparison with Vector RAG

📚 Based On

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages