Skip to content

siddsachar/Thoth-v1

Repository files navigation

𓁟 Thoth — Private Knowledge Agent

Thoth is a local-first, privacy-focused knowledge agent that combines Retrieval-Augmented Generation (RAG) with multi-source information retrieval. It lets you upload your own documents, ask questions in a conversational chat interface, and get cited answers drawn from your documents, Wikipedia, Arxiv, and the web — all powered by a locally-running LLM via Ollama.

Why "Thoth"?

In ancient Egyptian mythology, Thoth (𓁟) was the god of wisdom, writing, and knowledge — the divine scribe who recorded all human understanding. He was credited with inventing hieroglyphs, maintaining the library of the gods, and serving as the impartial judge of truth. Naming a private knowledge agent after Thoth felt fitting: like its namesake, this tool is built to gather, organize, and faithfully retrieve knowledge — while keeping everything under your control, running locally on your own machine.


Features

Chat & Conversation Management

  • Multi-turn conversational Q&A with full message history
  • Persistent conversation threads stored in a local SQLite database
  • Auto-naming — threads are automatically renamed to the first question asked
  • Thread switching — resume any previous conversation seamlessly
  • Thread deletion — remove conversations you no longer need

Model Selection

  • Dynamic model switching — choose any Ollama-supported model from the Settings panel in the sidebar
  • Curated model list — includes popular models (Llama, Qwen, Gemma, Mistral, DeepSeek, Phi, etc.) alongside any models you've already downloaded
  • Automatic download — selecting a model you haven't downloaded yet triggers an in-app download with a live progress indicator
  • First-run setup — if the default model isn't available, the app automatically downloads it on startup
  • Local indicators — models are marked with ✅ (downloaded) or ⬇️ (needs download) in the selector

API Key Management

  • In-app configuration — add and edit API keys directly from the ⚙️ Settings panel (no need to edit source files)
  • Persistent storage — keys are saved to api_keys.json in the user data directory and loaded automatically on startup
  • Password-masked inputs — keys are hidden by default in the UI for security
  • Extensible — add new keys by editing the API_KEY_DEFINITIONS dict in api_keys.py

Intelligent Context Retrieval

  • Smart context assessment — an embedding similarity check first determines if existing context already covers the question; only falls back to an LLM judgment call for ambiguous cases
  • Contextual compression retrieval — each retriever is wrapped with a ContextualCompressionRetriever + LLMChainExtractor that filters and extracts only query-relevant content per document before it enters the context
  • Query rewriting — follow-up questions with pronouns or references (e.g., "how are they related?") are automatically rewritten into standalone search queries using conversation history, so retrievers receive semantically complete queries
  • Parallel retrieval — all enabled retrieval sources are queried simultaneously via ThreadPoolExecutor, reducing total retrieval time from the sum of all sources to the duration of the slowest one
  • Context deduplication — embedding-based cosine similarity deduplication operates at two levels:
    • Within-retrieval: removes near-duplicate documents returned by different sources in the same query
    • Cross-turn: prevents adding context that is too similar to already-accumulated context from previous turns
  • Character-based context & message trimming — context entries and message history are trimmed to fit within a character budget (1 token ≈ 4.5 characters), keeping the most recent entries and preventing context window overflow in long conversations
  • Accumulated context — context from multiple queries within a thread builds up rather than being replaced
  • Configurable retrieval sources — toggle each retrieval backend on/off from the Settings panel:
    Source Description
    📄 Documents FAISS vector similarity search over your indexed files
    🌐 Wikipedia Real-time Wikipedia article retrieval
    📚 Arxiv Academic paper search via the Arxiv API
    🔍 Web Search Live web search via the Tavily Search API

Document Management

  • Upload & index PDF, DOCX, DOC, and TXT files
  • Automatic chunking with RecursiveCharacterTextSplitter (4000-char chunks, 200-char overlap)
  • FAISS vector store with persistent local storage
  • Embedding model: Qwen/Qwen3-Embedding-0.6B via HuggingFace
  • Duplicate detection — already-processed files are skipped
  • Clear all — one-click reset of the entire vector store and processed files list

Source Citation

Every piece of information in an answer is cited:

  • (Source: document.pdf) for uploaded documents
  • (Source: https://en.wikipedia.org/...) for Wikipedia
  • (Source: https://arxiv.org/abs/...) for Arxiv papers
  • (Source: https://...) for web search results
  • (Source: Internal Knowledge) when the LLM uses its own training data

Architecture

┌────────────────────────────────────────────────────────────┐
│                    Streamlit Frontend (app.py)              │
│  ┌──────────┐   ┌─────────────────────┐   ┌────────────┐  │
│  │ Sidebar  │   │    Chat Interface   │   │  Document  │  │
│  │ Threads  │   │   (Q&A Messages)    │   │  Manager   │  │
│  └──────────┘   └─────────────────────┘   └────────────┘  │
└────────────────────────┬───────────────────────────────────┘
                         │
                         ▼
┌────────────────────────────────────────────────────────────┐
│                 LangGraph RAG Pipeline (rag.py)            │
│                                                            │
│   START ──▶ needs_context ──┬──▶ get_context ──▶ generate  │
│                             │                    _answer   │
│                             └──────────────────▶ generate  │
│                                                  _answer   │
│                                                    │       │
│                                                    ▼       │
│                                                   END      │
└────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
  ┌────────────┐    ┌──────────────┐     ┌──────────────┐
  │  Ollama    │    │   Retrievers │     │   SQLite     │
  │  LLM      │    │  (FAISS,     │     │  Checkpointer│
  │(qwen3-vl) │    │   Wiki,      │     │  (threads.db)│
  └────────────┘    │   Arxiv,Web) │     └──────────────┘
                    └──────────────┘

LangGraph State Machine

The RAG pipeline is implemented as a LangGraph StateGraph with three nodes:

  1. needs_context — First checks if existing accumulated context is already relevant to the current question using embedding cosine similarity (fast, no LLM call). If no existing context is relevant, falls back to an LLM judgment call. Returns Yes/No.
  2. get_context — Rewrites the user's question into a standalone search query (resolving pronouns/references from conversation history), then queries all enabled retrieval backends in parallel via ThreadPoolExecutor. Each retriever is wrapped with a ContextualCompressionRetriever that extracts only query-relevant content per document. Results are deduplicated within the retrieval batch and against existing accumulated context using embedding cosine similarity.
  3. generate_answer — Trims accumulated context and message history to fit within the model's character budget, then formats the system prompt, context, and question into a final prompt and generates the answer with citations.

A conditional edge routes from needs_context to either get_context or directly to generate_answer.


Project Structure

Thoth/                          # Source / installation directory
├── app.py                      # Streamlit frontend — UI, chat, document upload
├── rag.py                      # LangGraph RAG pipeline — nodes, edges, state
├── documents.py                # Document loading, chunking, FAISS vector store
├── models.py                   # LLM configuration (Ollama)
├── threads.py                  # Thread/conversation management (SQLite)
├── api_keys.py                 # API key management (load/save/apply from JSON)
└── README.md

~/.thoth/                       # User data directory (auto-created at runtime)
├── api_keys.json               # Stored API keys
├── processed_files.json        # Tracks which files have been indexed
├── threads.db                  # SQLite database for thread metadata
└── vector_store/               # FAISS index files
    ├── index.faiss
    └── index.pkl

Data directory: All user data is stored in ~/.thoth/ (%USERPROFILE%\.thoth\ on Windows). This keeps data separate from the app installation and avoids write-permission issues in protected directories like C:\Program Files\. Override the location by setting the THOTH_DATA_DIR environment variable.

Module Descriptions

File Purpose
app.py Streamlit application with three-panel layout: sidebar (threads + settings), center (chat), right (documents). Handles UI state, file uploads, model selection, retrieval source toggles, and invokes the RAG graph.
rag.py Defines the LangGraph state machine with SessionState, retriever initialization, context compression, and answer generation. Also supports a CLI mode via __main__.
documents.py Manages document ingestion: loading (PDF/DOCX/TXT), text splitting, embedding with Qwen/Qwen3-Embedding-0.6B, FAISS storage, and processed file tracking.
models.py LLM model management — listing, downloading, and switching Ollama models at runtime.
threads.py SQLite-backed thread metadata (create, list, rename, delete) and LangGraph SqliteSaver checkpointer for persisting conversation state. Data stored in ~/.thoth/threads.db.
api_keys.py API key management — defines available keys, reads/writes ~/.thoth/api_keys.json, and applies keys as environment variables at startup. The Settings UI in app.py uses this module to let users add/edit keys.

Prerequisites

  • Python 3.11+
  • Ollama installed and running locally
  • Tavily API Key for web search (configured via the in-app Settings panel)

Note: You no longer need to manually pull a model — the app will automatically download the default model (qwen3:8b) on first run if it isn't available.


Installation

  1. Clone the repository

    git clone https://github.com/yourusername/thoth.git
    cd thoth
  2. Create a virtual environment

    python -m venv .venv
  3. Activate the virtual environment

    # Windows
    .venv\Scripts\activate
    
    # macOS / Linux
    source .venv/bin/activate
  4. Install dependencies

    pip install streamlit langchain-community langchain-core langchain-classic langchain-huggingface langchain-ollama langgraph faiss-cpu torch transformers pypdf python-docx unstructured
  5. Configure API keys

    Launch the app and open ⚙️ Settings in the sidebar. Enter your API keys (e.g. Tavily) in the API Keys section. Keys are saved to ~/.thoth/api_keys.json and loaded automatically on future runs.

    Alternatively, you can create ~/.thoth/api_keys.json manually:

    {
      "TAVILY_API_KEY": "your-tavily-api-key"
    }

    To use a custom data directory, set the THOTH_DATA_DIR environment variable before launching.

  6. Ensure Ollama is running

    ollama serve

Usage

Web Interface (Streamlit)

streamlit run app.py

This opens the Thoth web UI in your browser with:

  • Left sidebar: Create, switch, and delete conversation threads; Settings panel at the bottom for model selection, retrieval source toggles, and API key management
  • Center: Chat interface for asking questions
  • Right panel: Upload and manage documents

CLI Mode

python rag.py

This starts an interactive terminal session where you can select/create threads and ask questions directly.


How It Works

  1. User asks a question in the chat interface.
  2. The needs_context node first checks if existing context is relevant to the question via embedding similarity. If no relevant context exists, it falls back to an LLM call to decide whether new retrieval is needed.
  3. If new context is needed, the get_context node:
    • Rewrites the question into a standalone query using conversation history (resolving pronouns like "they", "it", etc.)
    • Queries the enabled sources in parallel via ThreadPoolExecutor:
      • FAISS vector store (uploaded documents)
      • Wikipedia API
      • Arxiv API
      • Tavily web search
    • Each source uses a ContextualCompressionRetriever to extract only relevant content per document
    • Results are deduplicated within the batch and against existing accumulated context
  4. New context is appended to the existing context (not replaced), subject to character-budget trimming.
  5. The generate_answer node trims context and messages to fit within the model's character budget, then combines the system prompt, context, and question to produce a cited answer.
  6. The full conversation state is checkpointed in SQLite, enabling thread persistence across sessions.

Configuration

LLM Model

Select a model directly from the ⚙️ Settings panel in the sidebar. You can also change the default model in models.py:

DEFAULT_MODEL = "qwen3:8b"  # Change to any Ollama-supported model

Embedding Model

Change the embedding model in documents.py:

embedding_model = HuggingFaceEmbeddings(
    model_name="Qwen/Qwen3-Embedding-0.6B"  # Change to any HuggingFace embedding model
)

Chunking Parameters

Adjust text splitting in documents.py:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=4000,      # Characters per chunk
    chunk_overlap=200     # Overlap between chunks
)

Retriever Settings

Modify the number of documents retrieved in rag.py:

document_retriever = vector_store.as_retriever(search_kwargs={"k": 5})  # Top-k results

Supported File Types

Extension Loader
.pdf PyPDFLoader
.docx UnstructuredWordDocumentLoader
.doc UnstructuredWordDocumentLoader
.txt TextLoader

License

This project is licensed under the MIT License.

About

Thoth is a local-first, privacy-focused knowledge agent that combines Retrieval-Augmented Generation (RAG) with multi-source information retrieval. It lets you upload your own documents, ask questions in a conversational chat interface, and get cited answers drawn from your documents, Wikipedia, Arxiv, and the web — all powered by a locally LLM

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors