Skip to content

rahulT-17/RUX_AI_Orchestration_Engine

Repository files navigation

RUX - AI Orchestration Engine

A local AI orchestration backend that turns natural language into safe, persistent state changes with validation, observability, feedback, and critique.

Architecture Snapshot

RUX architecture

flowchart TD
    U["User Message"] --> P["Planner"]
    P --> E["Executor / Trust Boundary"]
    E --> T["Tool Adapters"]
    T --> D["Domain Services"]
    D --> R["Repositories"]
    R --> DB["PostgreSQL"]

    E --> O["Observability"]
    E --> C["Confidence + Critic"]

    D --> X["Expense Domain"]
    D --> Y["Project Domain"]
    D --> Z["Future Memory / Knowledge"]
Loading

Status

RUX is under active development and is currently being refactored toward a more modular domain-based architecture.

What RUX Is

Most toy agents look like this:

LLM -> tool -> response

RUX is built around a stronger runtime contract:

User
 -> Planner
 -> Executor (trust boundary)
 -> Tool Adapter
 -> Domain Service
 -> Repository
 -> PostgreSQL
 -> Observability / Outcome Tracking / Critique / Confidence
 -> Final Response

The core idea is simple: the LLM is not trusted. Anything before the executor is probabilistic. Anything after schema validation is expected to be deterministic, auditable, and safe to reason about.

Key Design Decisions

1. The Trust Boundary

The Executor is where trust is established. LLM output is treated as untrusted input and must pass schema validation with extra="forbid" before any tool is called. This catches hallucinated field names, invented action types, and malformed JSON before they reach domain logic.

LLM output         -> untrusted - can hallucinate anything
Executor (schema)  -> trust boundary
Tool onward        -> deterministic, validated, safe

2. Why the Planner Doesn't Call Tools Directly

LLMs are probabilistic. Tools are deterministic, state-mutating, and potentially destructive. Mixing these responsibilities makes the system harder to test, harder to reason about, and much easier to break.

Planner  -> intent extraction only
Executor -> structural validation
Tool     -> domain gateway
Service  -> business rules
Memory   -> persistence

3. Three-Layer Planner

Not everything should reach the LLM.

Layer 1 -> greeting keywords -> instant deterministic reply
Layer 2 -> action intent     -> LLM extracts structured JSON
Layer 3 -> open question     -> LLM responds conversationally

This protects confidence score integrity. Earlier, greeting-like inputs could accidentally flow into action logic and pollute outcome history.

4. Confidence from Data, Not from the LLM

Asking an LLM how confident it is usually produces weak signals. RUX is designed to calculate confidence from real historical outcomes:

SELECT domain, task_type,
       COUNT(*)         AS samples,
       AVG(was_correct) AS accuracy
FROM agent_outcomes
WHERE user_id = :user_id
GROUP BY domain, task_type

Confidence should only surface when there is enough history to justify it. Otherwise the system should return something like "Confidence: insufficient data" instead of fabricating certainty.

5. Critic Uses a Different Model

If the Planner and Critic use the same model, the Critic tends to agree with the original reasoning too easily. The idea behind RUX is that critique should be structurally independent, so the second opinion can challenge the first instead of just echoing it.

Core Ideas

  • Trust boundary: planner output is treated as untrusted until it passes schema validation.
  • Thin tools: tools translate validated params into domain service calls.
  • Domain-first structure: business behavior lives inside domains, not inside runtime glue.
  • Observable execution: runs and outcomes are logged for inspection and feedback.
  • Confidence from history: confidence is derived from past correctness, not model self-reported certainty.
  • Critique as a second layer: decisions can be reviewed independently instead of trusting a single model pass.

Current Domains

  • Expense: expense logging, budget enforcement, spend analysis
  • Project: project creation and deletion flows
  • In progress: modular runtime cleanup, hybrid memory direction, future knowledge layer

Current Structure

rux/
├── api/                # FastAPI routes
├── core/               # current runtime layer
├── domains/
│   ├── expense/
│   └── project/
├── repositories/       # shared persistence adapters
├── services/           # shared services + some legacy files
├── memory/             # legacy memory path, planned for refactor
├── tests/
├── database.py
├── models.py
└── main.py

Response Model

RUX is moving toward a shared internal tool contract:

  • ToolResponse.status
  • ToolResponse.message
  • ToolResponse.data
  • ToolResponse.error
  • ToolResponse.metadata

This makes tool execution easier to validate, log, test, and later route cleanly through the executor.

Tech Stack

  • Python
  • FastAPI
  • SQLAlchemy async ORM
  • PostgreSQL
  • Pydantic
  • Local LLM serving via LM Studio

Setup

# Clone
git clone https://github.com/rahulT-17/RUX-AI-Companion.git
cd RUX-AI-Companion

# Create virtual environment
python -m venv .venv

# Activate (PowerShell)
.\.venv\Scripts\Activate.ps1

# Install dependencies
python -m pip install -r requirements.txt

# Initialize database tables
python init_db.py

# Run the API
python -m uvicorn main:app --reload

What Works Now

  • planner -> executor -> domain tool flow
  • expense logging and budget enforcement
  • project creation and deletion
  • database-backed persistence
  • execution logging and feedback-oriented infrastructure
  • smoke tests for expense and project tool adapters

Roadmap

  • Make the executor consume ToolResponse end-to-end
  • Unify confirmed actions with the normal execution pipeline
  • Remove legacy duplicated service/repository files
  • Build hybrid memory: short-term, episodic, semantic retrieval
  • Add a knowledge layer for reusable facts, concepts, and sources
  • Improve deployment and production config hygiene

Why I Built This

I built RUX to understand what actually breaks in AI agent systems when you move past demos: unreliable tool calls, weak trust boundaries, missing feedback loops, and no real way to measure correctness over time.

The goal is not to build another chatbot wrapper. The goal is to build the runtime layer underneath an AI agent system: validation, orchestration, observability, critique, and eventually memory and knowledge.


Built as a learning project. Actively evolving.

About

AI orchestration engine with deterministic execution, observability, and confidence-driven feedback.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors