Skip to content

Pavdig/SillyTavern-Docs-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Status Update Status Last Commit GitHub Downloads GitHub Stars

SillyTavern Documentation (RAG-Ready)

Note: This is a Proof of Concept (PoC) project initiated from SillyTavern-Docs Issue #183. It aims to solve the problem of "token noise" and nested directory parsing issues when ingesting SillyTavern documentation into RAG knowledge bases (like Open WebUI).

This repository hosts an automated pipeline that mirrors, processes, and sanitizes the official SillyTavern Documentation.

The goal is to create a "flat" and clean set of Markdown files optimized for AI ingestion. It also generates single-file bundles (llms.txt, llms.md, and llms.json) for easy drag-and-drop context in LLMs like ChatGPT or Claude.

📥 Downloads

Go to the Releases Page to download the latest RAG-ready ZIP package.

🛠️ Usage for RAG

Option 1: Knowledge Bases (e.g., Open WebUI)

  1. Download the latest .zip from Releases.
  2. Extract the docs/ folder.
  3. Upload the files to your RAG Knowledge Base.

Option 2: Single-File Context (ChatGPT, Claude, etc.)

  1. Download the .zip or view the files in the repo root.
  2. Use llms.txt or llms.md: Drag and drop this single file into a "Project" or long-context chat window. It contains the entire documentation in one readable text file.
  3. Use llms.json: For programmatic ingestion or tools that support JSON knowledge dumps.

⚙️ How it Works

The automation pipeline runs on a schedule to ensure this repo stays in sync with SillyTavern documentation development.

1. The Build Process

  • Clone Upstream: Pulls the latest docs from the official repository.
  • Flatten Structure: Converts nested folders into flat filenames to preserve context for the AI.
    • Example: SillyTavern-Docs/Installation/Windows.mdSillyTavern_Installation_Windows.md
  • Sanitize Content: Removes noise that confuses LLMs:
    • Redirect stubs: Files that just say "Page moved" are removed.
    • Front Matter: Metadata headers (---) are stripped.
    • Screenshots: The entire "Screenshots" section is removed from files.
    • Admonitions: Tags like !!!warning are stripped, but titles and text are preserved.
    • Images: All image tags are removed entirely to reduce token usage.
    • Links: Internal links are stripped (text preserved) to prevent hallucinations in bundles.
  • Generate Bundles: Aggregates all processed files into a single context-rich file (llms.txt) and a structured JSON (llms.json).

2. The Update Cycle (CI/CD)

  1. Check: The bot checks the official docs for changes every hour.
  2. Sync: If updates are found, they are processed and pushed to a temporary branch.
  3. Notify: A Pull Request is automatically opened (or updated) with the changelog.
  4. Release: When the PR is merged, a GitHub Release is automatically published with a date-stamped ZIP file containing the docs and bundles.

🎖️ Credits