Skip to content

Satham666/KlimtechRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

349 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KlimtechRAG — Smart Query Router + RAG Pipeline

Version: v8.6+ | Smart Query Routing | Cost Optimization | Development Tools

A lightweight, cost-optimized RAG system for climate control documentation and project management. Routes queries intelligently between local models and external LLMs to minimize costs (7.4× cheaper than all-Claude).


🎯 Quick Start

Prerequisites

  • Python 3.8+
  • Qdrant vector database (running on localhost:6333)

Setup — Single venv (Unified Environment)

Use ttkb_tut/venv — this is the ONLY environment needed for all tasks:

# 1. Activate project venv
source ttkb_tut/venv/bin/activate

# 2. Verify all packages installed
python3 -c "import fastapi, qdrant_client, fastembed; print('✅ All packages OK')"

# 3. Run services
python3 scripts/rag_router.py         # Query Router on :8000
python3 scripts/dispatcher_service.py # Dispatcher on :8001 (optional)
python3 scripts/dev_logger.py         # Auto-logging (git hooks)

Single venv includes:

  • ✅ FastAPI + Uvicorn (web services)
  • ✅ Qdrant client (vector database)
  • ✅ FastEmbed + FastEmbed-GPU (embeddings)
  • ✅ Pydantic (data validation)
  • ✅ Requests (HTTP client)
  • ✅ All ML/AI dependencies (torch, transformers, etc.)
  • ✅ Development tools (pytest, black, etc.)

Setup (if venv missing):

python3 -m venv ttkb_tut/venv
source ttkb_tut/venv/bin/activate
pip install fastapi uvicorn requests qdrant-client fastembed fastembed-gpu pydantic

📚 Services

Query Router (Core)

Smart query routing based on pattern matching.

# Dev: python3 scripts/rag_router.py
# Prod: /home/tamiel/programy/klimtech-embed-venv/bin/python3 scripts/rag_router.py

# Server starts on http://localhost:8000

API:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Jak zamontować klimatyzator?"}'

Response:

{
  "decision": "ALWAYS",
  "use_rag": true,
  "collection": "kb_md",
  "recommended_model": "deepseek-flash",
  "cost": "$0.000084",
  "confidence": 0.95
}

Decision Types:

  • ALWAYS — Use RAG (high importance, project-specific, HVAC domain)
  • ⏸️ CONDITIONAL — Try direct first, escalate to RAG if uncertain
  • NEVER — Direct LLM only (writing, general knowledge, session context)
  • 🌐 WEB_SEARCH — Fetch from internet (weather, news, current events)

Query Client (CLI)

Pretty-printed wrapper around Query Router.

# Dev (in activated venv)
python3 scripts/query_client.py "Jak zamontować klimatyzator?" [--verbose] [--no-cache] [--json]

# Prod
/home/tamiel/programy/klimtech-embed-venv/bin/python3 scripts/query_client.py "query text"

Examples:

# Basic
python3 scripts/query_client.py "Jak zamontować klimatyzator?"
# Output: ✅ Decision: ALWAYS, Confidence: 1.00, Cost: $0.000084

# Verbose (show matched categories)
python3 scripts/query_client.py "GPU test" --verbose
# Output: ... Details: Matched: rag_always.project, Tokens: 600, Type: project

# JSON (raw API response)
python3 scripts/query_client.py "query" --json

# Bypass cache
python3 scripts/query_client.py "query" --no-cache

Dispatcher Service (Optional, v8.7+)

Orchestrates routing → retrieval → model selection → LLM call.

python3 scripts/dispatcher_service.py
# Server on http://localhost:8001

🛠️ Development Tools

dev_logger.py — Auto-Logging

Log commits, snapshots, and messages to supervisor_memory:

# Log git commit
python3 scripts/dev_logger.py log-commit --hash $(git rev-parse HEAD)

# Manual snapshot
python3 scripts/dev_logger.py snapshot --session "v8.6-release" --notes "Initial v8.6"

# Log message status
python3 scripts/dev_logger.py log-message --msg-id msg-013 --status DONE --task-type IMPLEMENT

📊 Architecture

Query
  ↓
[Query Router (:8000)]
  ├→ Pattern matching
  ├→ Decision (ALWAYS/CONDITIONAL/NEVER/WEB_SEARCH)
  └→ Cost estimation
  
Decision
  ↓
[Context Retrieval (Qdrant)]
  ├→ kb_md (documentation)
  ├→ supervisor_memory (project history)
  └→ robotnik_logs (session logs)
  
Context + Decision
  ↓
[Model Selection]
  ├→ Flash ($0.14/M) — 80% of queries
  ├→ Pro ($1.74/M) — 15% of queries (escalation)
  └→ Claude ($5/M) — 5% critical
  
  ↓
[LLM Response]
  ↓
[Result Logging (supervisor_memory)]

🎯 Query Categories

See QUERY_CATEGORIES.md for 44+ classified examples.

Quick Reference:

Category Decision Model Example
HVAC procedures ALWAYS Flash + RAG "Jak zamontować klimatyzator?"
Project history ALWAYS Flash + RAG "Dlaczego GPU test się wysypał?"
Writing tasks NEVER Flash "Napisz email do klienta"
Weather/news WEB_SEARCH Flash "Jaka jest pogoda?"
Concepts CONDITIONAL Flash→Pro "Na czym polega lazy loading?"

💰 Cost Optimization

Monthly cost comparison (100 queries/day = 3000/month):

Approach Cost vs. All-Claude
All Claude $7.50
Smart Router $1.02 7.4× cheaper
Flash only $0.42 17.8× cheaper (but worse accuracy)

Breakdown by category:

  • 30% ALWAYS (RAG + Flash): $0.00084/query
  • 40% CONDITIONAL direct: $0.00028/query
  • 15% NEVER: $0.00021/query
  • 5% WEB_SEARCH: $0.00042/query
  • <1% escalation to Claude: $0.0025/query

🧪 Testing

Unit Tests

# Query Router (69 test cases)
python3 -c "
import sys; sys.path.insert(0, 'scripts')
from rag_router import RAGRouter
router = RAGRouter()
tests = [
    ('Jak zamontować klimatyzator?', 'always'),
    ('Napisz mail.', 'never'),
    ('Jaka pogoda?', 'web_search'),
]
for q, expected in tests:
    result = router.route(q)
    status = '✓' if result.decision == expected else '✗'
    print(f'{status} {q} → {result.decision}')
"

Integration Tests

# Start services
python3 scripts/rag_router.py &

# Run query client tests
python3 tests/test_query_skill.py

# Stop service
pkill -f rag_router

📁 Project Structure

KlimtechRAG/
├── scripts/
│   ├── rag_router.py              # Core Query Router (FastAPI)
│   ├── query_client.py            # CLI wrapper with caching
│   ├── dispatcher_service.py       # Multi-agent orchestrator (v8.7+)
│   ├── dev_logger.py              # Auto-logging to Qdrant
│
├── tests/
│   ├── test_rag_router.py         # 69 unit test cases
│   └── test_query_skill.py        # 10 integration tests
│
├── postLLMs/                      # Pair programming system
│   ├── claude_outbox/             # Tasks sent to Robotnik
│   ├── deepseek_outbox/           # Responses from Robotnik
│   └── worker.lock                # Worker mode indicator
│
├── wiki/                          # External memory
│   ├── status.md                  # Session summary
│   ├── decisions.md               # Architecture decisions
│   └── lessons.md                 # Discovered patterns
│
├── CLAUDE.md                      # Project constitution
├── QDRANT_LAPTOP.md              # VRAM strategy doc
├── QUERY_CATEGORIES.md           # Pattern matching rules (1183 lines)
└── README.md                      # This file

🚀 Running Everything (Full Stack)

# 1. Activate the single unified venv
source ttkb_tut/venv/bin/activate

# 2. Start Qdrant (separate terminal)
podman start qdrant  # if using podman

# 3. Start Query Router (separate terminal)
python3 scripts/rag_router.py
# Output: Starting RAG Router on http://localhost:8000

# 4. Test Query Router (main terminal)
python3 scripts/query_client.py "Jak zamontować klimatyzator?"
# Output: ✅ Decision: ALWAYS, Confidence: 1.00, Cost: $0.000084

# 5. Optional: Start Dispatcher
python3 scripts/dispatcher_service.py
# Output: Starting Dispatcher on http://localhost:8001

# 6. Deactivate when done
deactivate

All services use the same venv. No environment switching needed.


🔧 Configuration

Environment Variables

export QDRANT_URL="http://localhost:6333"        # Qdrant endpoint
export ROUTER_PORT=8000                          # Query Router port
export DISPATCHER_PORT=8001                      # Dispatcher port (v8.7+)
export EMBEDDING_MODEL="intfloat/multilingual-e5-large"
export CACHE_DB="/tmp/query_router_cache.db"     # Query cache (24h TTL)
export QDRANT_COLLECTIONS="kb_md,supervisor_memory,robotnik_logs"

Model Prices (in Qdrant router decision)

{
  "prices": {
    "flash": 0.14,      # $0.14/M tokens (DeepSeek Flash)
    "pro": 1.74,        # $1.74/M tokens (DeepSeek Pro)
    "claude": 5.0       # $5/M tokens (Claude)
  }
}

📖 Documentation

  • QUERY_CATEGORIES.md — Complete pattern matching rules with 44+ examples
  • QDRANT_LAPTOP.md — VRAM strategy, model hierarchy, scaling points
  • CLAUDE.md — Project constitution (git workflow, security, venv strategy)
  • wiki/status.md — Session summary and progress tracking
  • tasks/phase_2_integration.md — Integration roadmap (v8.7-v8.8)

🐛 Troubleshooting

Query Router won't start

# Check if port 8000 is in use
ss -tlnp | grep 8000

# Kill existing process
pkill -f rag_router

# Verify dependencies
python3 -c "import fastapi, uvicorn; print('OK')"
# If error, reinstall: pip install fastapi uvicorn

Service not responding

# Check Qdrant is running
curl http://localhost:6333/collections

# Check logs
tail -20 /tmp/rag_router.log    # if started with nohup
tail -20 /tmp/dispatcher.log    # dispatcher logs

venv not activated

# Check that venv is active
which python3
# Should be: /home/tamiel/KlimtechRAG/ttkb_tut/venv/bin/python3

# Activate the unified venv
source ttkb_tut/venv/bin/activate

# If importing fails, reinstall dependencies
pip install --upgrade fastapi uvicorn requests qdrant-client \
  fastembed fastembed-gpu pydantic

📈 Performance Targets

Operation Target Current
Query routing < 50ms ~50ms ✅
Qdrant retrieval < 300ms ~200-500ms ⚠️
LLM response < 2s varies by model
Cache hit < 5ms ~2-5ms ✅
End-to-end < 500ms ~250-700ms ⚠️

🎓 Learning Resources

  • Test examples: tests/test_rag_router.py (69 cases)
  • Pattern matching: scripts/query_patterns.json
  • API integration: scripts/query_client.py (HTTP client example)
  • Qdrant usage: scripts/dispatcher_service.py /search (retrieval pattern)
  • Auto-logging: scripts/dev_logger.py (supervisor_memory integration)

📝 License & Attribution

Project: KlimtechRAG (Climate Control Documentation RAG)
Version: v8.6 (Query Routing MVP) → v8.7 (Integration) → v9.0 (Full Feature)
Built by: Szef (Claude Code) + Robotnik (DeepSeek/OpenCode)
Last Updated: 2026-04-24


🤝 Contributing

This is a pair-programming project using postLLMs system (file-based async task management).

For developers:

  1. Choose venv path (dev vs prod)
  2. Activate environment
  3. Run services
  4. Test with query_client.py
  5. Submit changes via git

For pair programming:

  • Tasks go in postLLMs/claude_outbox/ (msg-NNN-description.md)
  • Responses in postLLMs/deepseek_outbox/ (auto-poll monitors)
  • Memory snapshots to Qdrant supervisor_memory

See CLAUDE.md §20 for full protocol.


Questions? Check wiki/status.md or tasks/phase_2_integration.md for ongoing work.

Ready to optimize your queries? 🚀

python3 scripts/query_client.py "Your question here"