Character-focused local chatbot with RAG support (ChromaDB + LangChain), CLI and web entrypoints, and tooling for metadata generation and collection management.
- Detailed RAG management docs:
docs/rag_management/00_README.md - RAG scripts guide:
docs/RAG_SCRIPTS_GUIDE.md - Context management docs:
docs/context_management/00_README.md - Config files docs:
docs/configs/00_README.md - Future work:
docs/future_work/
- Local chat runtime backed by
llama-cpp-python - Character-card-driven prompting (
cards/*.json) - RAG retrieval from ChromaDB collections
- Dynamic context budgeting and history management
- GPU offload auto-layer calculation and KV cache quant support
- Scripted workflows for analyzing, pushing, and managing RAG data
- CLI chat:
main.py - Web chat (FastAPI + Jinja2 + HTMX):
web_app.py
Run either with uv:
uv run python main.py
uv run uvicorn web_app:app --host 127.0.0.1 --port 8000Primary desktop workflow:
- Open the repository from the Windows dev drive in VS Code.
- Use the integrated PowerShell terminal to run the
uvcommands above. - WSL/Ubuntu with
fishremains a supported alternative workflow if you still use it.
The repository now includes Windows-focused VS Code tasks in .vscode/tasks.json for:
Start web server (Windows)Stop web server (Windows)Restart web server (Windows)
These tasks use the same documented uv command shown above and run from PowerShell with Windows-friendly stop behavior on port 8000.
Stop the web server from another terminal:
PowerShell:
Get-NetTCPConnection -LocalPort 8000 -State Listen |
Select-Object -ExpandProperty OwningProcess -Unique |
ForEach-Object { Stop-Process -Id $_ }WSL/Unix alternative:
pkill -f 'uvicorn web_app:app'Web diagnostics endpoints:
curl.exe -s http://127.0.0.1:8000/health
curl.exe -s http://127.0.0.1:8000/healthz/full
curl.exe -s http://127.0.0.1:8000/chat/debug
curl.exe -s http://127.0.0.1:8000/chat/debug/history
curl.exe -s http://127.0.0.1:8000/chat/session/listNotes for web chat behavior:
- Shows status updates (
Ready,Sending,Thinking,Streaming,Timed out). - Applies a stream timeout and surfaces a
Retrybutton on stream failure. - Supports named session save + explicit session picker load in the sidebar.
- Shows both latest retrieval debug stats and per-turn retrieval trace history.
- Provides quick actions for copy/export and command-equivalent controls (
clear,reload,help).
uv syncPython requirement is defined in pyproject.toml (>=3.13).
Use module-style invocation for the active RAG scripts:
uv run python -m scripts.rag.<script_name> ...This is the preferred form in the docs because it is more reliable for package imports than calling nested script paths directly.
- Analyze source text and generate metadata:
uv run python -m scripts.rag.analyze_rag_text analyze rag_data/shodan.txt \
-o rag_data/shodan.json \
--strict \
--review-report rag_data/shodan_review.json- Validate metadata:
uv run python -m scripts.rag.analyze_rag_text validate rag_data/shodan.json- Optional quality gates before push:
uv run python -m scripts.rag.manage_collections coverage score \
--metadata-file rag_data/shodan.json \
--source-file rag_data/shodan.txt \
--threshold 0.75
uv run python -m scripts.rag.manage_collections lint message-examples --fix- Push lore and message examples into collections:
uv run python -m scripts.rag.push_rag_data rag_data/shodan.txt -c shodan -w
uv run python -m scripts.rag.push_rag_data rag_data/shodan_message_examples.txt -c shodan_mes -w- Spot-check retrieval quality:
uv run python -m scripts.rag.manage_collections test shodan -q "SHODAN origin" -k 5- Evaluate retrieval fixtures with summary metrics:
uv run python -m scripts.rag.manage_collections evaluate-fixtures --fixture-file tests/fixtures/retrieval_fixtures.jsonOptional report export:
uv run python -m scripts.rag.manage_collections evaluate-fixtures \
--fixture-file tests/fixtures/retrieval_fixtures.json \
--output-json logs/retrieval_eval.json \
--output-csv logs/retrieval_eval.csvCommands:
analyzevalidatescan
Notable options:
--auto-categories/--no-auto-categories--auto-aliases/--no-auto-aliases--max-aliases--strict--review-report
Notable options:
-c/--collection-name(required)-w/--overwrite-d/--dry-run-m/--metadata-file-cs/--chunk-size-co/--chunk-overlap-t/--threads
Notes:
- Leading HTML header comments are stripped before chunking.
- Metadata auto-detection maps
<name>.txtand<name>_message_examples.txtto<name>.json. - If metadata exists, push runs a source-coverage quality gate before writing.
- Category threshold flags are informational at push time; change category assignment by regenerating metadata with
analyze_rag_text.
Commands:
list-collectionsdeletedelete-multipletestexportinfoevaluate-fixturesbenchmark-rerankbackfill-embedding-fingerprintcoverage scorelint message-examples
Top-level wrappers exist for moved scripts:
scripts/analyze_rag_text.pyscripts/push_rag_data.pyscripts/manage_collections.pyscripts/fetch_character_context.pyscripts/build_flash_attention.pyscripts/build_flash_attention.shscripts/build_flash_attention.ps1scripts/build_cuda_only.ps1
The following implementation points are reflected in current docs and code:
- RAG management script set is in place (
analyze_rag_text,push_rag_data,manage_collections). - Metadata analysis + validation workflow is implemented and tested in
tests/test_rag_scripts.py. - Collection management supports listing, deletion, pattern deletion, testing, export, and info commands.
- Script workflows are documented in
docs/RAG_SCRIPTS_GUIDE.md. - Architecture remains CLI-first with shared config usage via
configs/config.v2.json.
This section intentionally focuses on active behavior and omits historical benchmark/commit snapshot details.
Runtime config is defined in configs/config.v2.json.
- Start from
configs/config.v2.example.json. - Runtime loads
configs/config.v2.jsondirectly.
{
"rag": {"collection": "shodan", "k": 3, "k_mes": 2, "use_mmr": true},
"context": {"dynamic": {"enabled": true}, "history": {"max_turns": 10}},
"model": {"type": "mistral", "layers": "auto", "target_vram_usage": 0.8, "kv_cache_quant": "f16", "n_ctx": 32768}
}- RAG script usage:
docs/RAG_SCRIPTS_GUIDE.md - Detailed RAG management docs:
docs/rag_management/00_README.md - Context management docs:
docs/context_management/00_README.md - Config files docs:
docs/configs/00_README.md - GPU layer auto-tuning:
docs/AUTO_GPU_LAYERS.md - Flash attention build helper:
docs/FLASH_ATTENTION_BUILD.md - Future work status:
docs/future_work/
uv run ruff format .
uv run ruff check .
uv run pytestOr run targeted tests:
uv run pytest tests/test_rag_scripts.py
uv run pytest tests/test_response_processing.pySee LICENSE.