KlimtechRAG — Smart Query Router + RAG Pipeline

Version: v8.6+ | Smart Query Routing | Cost Optimization | Development Tools

A lightweight, cost-optimized RAG system for climate control documentation and project management. Routes queries intelligently between local models and external LLMs to minimize costs (7.4× cheaper than all-Claude).

🎯 Quick Start

Prerequisites

Python 3.8+
Qdrant vector database (running on localhost:6333)

Setup — Single venv (Unified Environment)

Use ttkb_tut/venv — this is the ONLY environment needed for all tasks:

# 1. Activate project venv
source ttkb_tut/venv/bin/activate

# 2. Verify all packages installed
python3 -c "import fastapi, qdrant_client, fastembed; print('✅ All packages OK')"

# 3. Run services
python3 scripts/rag_router.py         # Query Router on :8000
python3 scripts/dispatcher_service.py # Dispatcher on :8001 (optional)
python3 scripts/dev_logger.py         # Auto-logging (git hooks)

Single venv includes:

✅ FastAPI + Uvicorn (web services)
✅ Qdrant client (vector database)
✅ FastEmbed + FastEmbed-GPU (embeddings)
✅ Pydantic (data validation)
✅ Requests (HTTP client)
✅ All ML/AI dependencies (torch, transformers, etc.)
✅ Development tools (pytest, black, etc.)

Setup (if venv missing):

python3 -m venv ttkb_tut/venv
source ttkb_tut/venv/bin/activate
pip install fastapi uvicorn requests qdrant-client fastembed fastembed-gpu pydantic

📚 Services

Query Router (Core)

Smart query routing based on pattern matching.

# Dev: python3 scripts/rag_router.py
# Prod: /home/tamiel/programy/klimtech-embed-venv/bin/python3 scripts/rag_router.py

# Server starts on http://localhost:8000

API:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Jak zamontować klimatyzator?"}'

Response:

{
  "decision": "ALWAYS",
  "use_rag": true,
  "collection": "kb_md",
  "recommended_model": "deepseek-flash",
  "cost": "$0.000084",
  "confidence": 0.95
}

Decision Types:

✅ ALWAYS — Use RAG (high importance, project-specific, HVAC domain)
⏸️ CONDITIONAL — Try direct first, escalate to RAG if uncertain
❌ NEVER — Direct LLM only (writing, general knowledge, session context)
🌐 WEB_SEARCH — Fetch from internet (weather, news, current events)

Query Client (CLI)

Pretty-printed wrapper around Query Router.

# Dev (in activated venv)
python3 scripts/query_client.py "Jak zamontować klimatyzator?" [--verbose] [--no-cache] [--json]

# Prod
/home/tamiel/programy/klimtech-embed-venv/bin/python3 scripts/query_client.py "query text"

Examples:

# Basic
python3 scripts/query_client.py "Jak zamontować klimatyzator?"
# Output: ✅ Decision: ALWAYS, Confidence: 1.00, Cost: $0.000084

# Verbose (show matched categories)
python3 scripts/query_client.py "GPU test" --verbose
# Output: ... Details: Matched: rag_always.project, Tokens: 600, Type: project

# JSON (raw API response)
python3 scripts/query_client.py "query" --json

# Bypass cache
python3 scripts/query_client.py "query" --no-cache

Dispatcher Service (Optional, v8.7+)

Orchestrates routing → retrieval → model selection → LLM call.

python3 scripts/dispatcher_service.py
# Server on http://localhost:8001

🛠️ Development Tools

dev_logger.py — Auto-Logging

Log commits, snapshots, and messages to supervisor_memory:

# Log git commit
python3 scripts/dev_logger.py log-commit --hash $(git rev-parse HEAD)

# Manual snapshot
python3 scripts/dev_logger.py snapshot --session "v8.6-release" --notes "Initial v8.6"

# Log message status
python3 scripts/dev_logger.py log-message --msg-id msg-013 --status DONE --task-type IMPLEMENT

📊 Architecture

Query
  ↓
[Query Router (:8000)]
  ├→ Pattern matching
  ├→ Decision (ALWAYS/CONDITIONAL/NEVER/WEB_SEARCH)
  └→ Cost estimation
  
Decision
  ↓
[Context Retrieval (Qdrant)]
  ├→ kb_md (documentation)
  ├→ supervisor_memory (project history)
  └→ robotnik_logs (session logs)
  
Context + Decision
  ↓
[Model Selection]
  ├→ Flash ($0.14/M) — 80% of queries
  ├→ Pro ($1.74/M) — 15% of queries (escalation)
  └→ Claude ($5/M) — 5% critical
  
  ↓
[LLM Response]
  ↓
[Result Logging (supervisor_memory)]

🎯 Query Categories

See QUERY_CATEGORIES.md for 44+ classified examples.

Quick Reference:

Category	Decision	Model	Example
HVAC procedures	ALWAYS	Flash + RAG	"Jak zamontować klimatyzator?"
Project history	ALWAYS	Flash + RAG	"Dlaczego GPU test się wysypał?"
Writing tasks	NEVER	Flash	"Napisz email do klienta"
Weather/news	WEB_SEARCH	Flash	"Jaka jest pogoda?"
Concepts	CONDITIONAL	Flash→Pro	"Na czym polega lazy loading?"

💰 Cost Optimization

Monthly cost comparison (100 queries/day = 3000/month):

Approach	Cost	vs. All-Claude
All Claude	$7.50	1×
Smart Router	$1.02	7.4× cheaper
Flash only	$0.42	17.8× cheaper (but worse accuracy)

Breakdown by category:

30% ALWAYS (RAG + Flash): $0.00084/query
40% CONDITIONAL direct: $0.00028/query
15% NEVER: $0.00021/query
5% WEB_SEARCH: $0.00042/query
<1% escalation to Claude: $0.0025/query

🧪 Testing

Unit Tests

# Query Router (69 test cases)
python3 -c "
import sys; sys.path.insert(0, 'scripts')
from rag_router import RAGRouter
router = RAGRouter()
tests = [
    ('Jak zamontować klimatyzator?', 'always'),
    ('Napisz mail.', 'never'),
    ('Jaka pogoda?', 'web_search'),
]
for q, expected in tests:
    result = router.route(q)
    status = '✓' if result.decision == expected else '✗'
    print(f'{status} {q} → {result.decision}')
"

Integration Tests

# Start services
python3 scripts/rag_router.py &

# Run query client tests
python3 tests/test_query_skill.py

# Stop service
pkill -f rag_router

📁 Project Structure

KlimtechRAG/
├── scripts/
│   ├── rag_router.py              # Core Query Router (FastAPI)
│   ├── query_client.py            # CLI wrapper with caching
│   ├── dispatcher_service.py       # Multi-agent orchestrator (v8.7+)
│   ├── dev_logger.py              # Auto-logging to Qdrant
│
├── tests/
│   ├── test_rag_router.py         # 69 unit test cases
│   └── test_query_skill.py        # 10 integration tests
│
├── postLLMs/                      # Pair programming system
│   ├── claude_outbox/             # Tasks sent to Robotnik
│   ├── deepseek_outbox/           # Responses from Robotnik
│   └── worker.lock                # Worker mode indicator
│
├── wiki/                          # External memory
│   ├── status.md                  # Session summary
│   ├── decisions.md               # Architecture decisions
│   └── lessons.md                 # Discovered patterns
│
├── CLAUDE.md                      # Project constitution
├── QDRANT_LAPTOP.md              # VRAM strategy doc
├── QUERY_CATEGORIES.md           # Pattern matching rules (1183 lines)
└── README.md                      # This file

🚀 Running Everything (Full Stack)

# 1. Activate the single unified venv
source ttkb_tut/venv/bin/activate

# 2. Start Qdrant (separate terminal)
podman start qdrant  # if using podman

# 3. Start Query Router (separate terminal)
python3 scripts/rag_router.py
# Output: Starting RAG Router on http://localhost:8000

# 4. Test Query Router (main terminal)
python3 scripts/query_client.py "Jak zamontować klimatyzator?"
# Output: ✅ Decision: ALWAYS, Confidence: 1.00, Cost: $0.000084

# 5. Optional: Start Dispatcher
python3 scripts/dispatcher_service.py
# Output: Starting Dispatcher on http://localhost:8001

# 6. Deactivate when done
deactivate

All services use the same venv. No environment switching needed.

🔧 Configuration

Environment Variables

export QDRANT_URL="http://localhost:6333"        # Qdrant endpoint
export ROUTER_PORT=8000                          # Query Router port
export DISPATCHER_PORT=8001                      # Dispatcher port (v8.7+)
export EMBEDDING_MODEL="intfloat/multilingual-e5-large"
export CACHE_DB="/tmp/query_router_cache.db"     # Query cache (24h TTL)
export QDRANT_COLLECTIONS="kb_md,supervisor_memory,robotnik_logs"

Model Prices (in Qdrant router decision)

{
  "prices": {
    "flash": 0.14,      # $0.14/M tokens (DeepSeek Flash)
    "pro": 1.74,        # $1.74/M tokens (DeepSeek Pro)
    "claude": 5.0       # $5/M tokens (Claude)
  }
}

📖 Documentation

QUERY_CATEGORIES.md — Complete pattern matching rules with 44+ examples
QDRANT_LAPTOP.md — VRAM strategy, model hierarchy, scaling points
CLAUDE.md — Project constitution (git workflow, security, venv strategy)
wiki/status.md — Session summary and progress tracking
tasks/phase_2_integration.md — Integration roadmap (v8.7-v8.8)

🐛 Troubleshooting

Query Router won't start

# Check if port 8000 is in use
ss -tlnp | grep 8000

# Kill existing process
pkill -f rag_router

# Verify dependencies
python3 -c "import fastapi, uvicorn; print('OK')"
# If error, reinstall: pip install fastapi uvicorn

Service not responding

# Check Qdrant is running
curl http://localhost:6333/collections

# Check logs
tail -20 /tmp/rag_router.log    # if started with nohup
tail -20 /tmp/dispatcher.log    # dispatcher logs

venv not activated

# Check that venv is active
which python3
# Should be: /home/tamiel/KlimtechRAG/ttkb_tut/venv/bin/python3

# Activate the unified venv
source ttkb_tut/venv/bin/activate

# If importing fails, reinstall dependencies
pip install --upgrade fastapi uvicorn requests qdrant-client \
  fastembed fastembed-gpu pydantic

📈 Performance Targets

Operation	Target	Current
Query routing	< 50ms	~50ms ✅
Qdrant retrieval	< 300ms	~200-500ms ⚠️
LLM response	< 2s	varies by model
Cache hit	< 5ms	~2-5ms ✅
End-to-end	< 500ms	~250-700ms ⚠️

🎓 Learning Resources

Test examples: tests/test_rag_router.py (69 cases)
Pattern matching: scripts/query_patterns.json
API integration: scripts/query_client.py (HTTP client example)
Qdrant usage: scripts/dispatcher_service.py /search (retrieval pattern)
Auto-logging: scripts/dev_logger.py (supervisor_memory integration)

📝 License & Attribution

Project: KlimtechRAG (Climate Control Documentation RAG)
Version: v8.6 (Query Routing MVP) → v8.7 (Integration) → v9.0 (Full Feature)
Built by: Szef (Claude Code) + Robotnik (DeepSeek/OpenCode)
Last Updated: 2026-04-24

🤝 Contributing

This is a pair-programming project using postLLMs system (file-based async task management).

For developers:

Choose venv path (dev vs prod)
Activate environment
Run services
Test with query_client.py
Submit changes via git

For pair programming:

Tasks go in postLLMs/claude_outbox/ (msg-NNN-description.md)
Responses in postLLMs/deepseek_outbox/ (auto-poll monitors)
Memory snapshots to Qdrant supervisor_memory

See CLAUDE.md §20 for full protocol.

Questions? Check wiki/status.md or tasks/phase_2_integration.md for ongoing work.

Ready to optimize your queries? 🚀

python3 scripts/query_client.py "Your question here"

Name		Name	Last commit message	Last commit date
Latest commit History 349 Commits
.opencode		.opencode
backend_app		backend_app
docs		docs
modele_LLM		modele_LLM
n8n_workflows		n8n_workflows
postLLMs		postLLMs
repo_github		repo_github
scripts		scripts
server_setup		server_setup
ssl		ssl
tests		tests
ui		ui
wiki		wiki
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AUDIO_LLM_OK.md		AUDIO_LLM_OK.md
CLAUDE.md		CLAUDE.md
Makefile		Makefile
PROJEKT_OPIS.md		PROJEKT_OPIS.md
Plan_Wdrożenia_Architektury_Agentowej.md		Plan_Wdrożenia_Architektury_Agentowej.md
Plan_wdrożenia_OpenWebUI.md		Plan_wdrożenia_OpenWebUI.md
README.md		README.md
files.zip		files.zip
grafana_dashboard_v8.8.json		grafana_dashboard_v8.8.json
klimtech_gui.py		klimtech_gui.py
klimtech_gui_demo.py		klimtech_gui_demo.py
migracja_openwebui.md		migracja_openwebui.md
requirements_qwen.txt		requirements_qwen.txt
settings-dev.yaml		settings-dev.yaml
settings-ingest.yaml		settings-ingest.yaml
settings-server.yaml		settings-server.yaml
settings.yaml		settings.yaml
start_backend_gpu.py		start_backend_gpu.py
start_klimtech_v3.py		start_klimtech_v3.py
stop_klimtech.py		stop_klimtech.py

Folders and files

Latest commit

History

Repository files navigation

KlimtechRAG — Smart Query Router + RAG Pipeline

🎯 Quick Start

Prerequisites

Setup — Single venv (Unified Environment)

📚 Services

Query Router (Core)

Query Client (CLI)

Dispatcher Service (Optional, v8.7+)

🛠️ Development Tools

dev_logger.py — Auto-Logging

📊 Architecture

🎯 Query Categories

💰 Cost Optimization

🧪 Testing

Unit Tests

Integration Tests

📁 Project Structure

🚀 Running Everything (Full Stack)

🔧 Configuration

Environment Variables

Model Prices (in Qdrant router decision)

📖 Documentation

🐛 Troubleshooting

Query Router won't start

Service not responding

venv not activated

📈 Performance Targets

🎓 Learning Resources

📝 License & Attribution

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages