Load this at the start of any IRIS build session. This is the source of truth.
IRIS — Inference, Response & Input System
A controlled AI runtime. Not an assistant. The model is an advisor. The system is the authority. Nothing executes without passing through deterministic gates.
Named in the Tony Stark tradition — every word in the acronym is something a 4th grader can say. The complexity is invisible.
Core principle: Models suggest. System decides. Execution is gated.
Repo: https://github.com/nikuhkid/iris
Local path: /home/nikuhkid/iris/
Architecture spec: /home/nikuhkid/iris/architecture_4.0.json
Mnem is the persistent memory and execution infrastructure IRIS runs on. Separate repo, separate concern.
Repo: https://github.com/nikuhkid/mnem
Local path: /home/nikuhkid/.claude/mnem/
entry_point (cli | discord | interface)
→ input_guard
→ translation_layer → slot_1 → plan
→ plan_analysis_initial (facts only — deterministic)
→ selective_redundancy gate (slot_2 with original_input if write/delete/state_change)
→ comparator (conflict → hard stop, escalate to user)
→ plan_analysis_final (interpretation only — facts immutable)
→ observer_log
→ decision_engine (final authority — no model touches this)
→ dry_run
→ user_action (approve | modify | reject | kill)
→ execution
| Layer | Role | DPO? |
|---|---|---|
| planning_model | Strict JSON only. No personality. No flair. | No |
| response_model | Natural language only. Personality layer. | Yes — tone only |
| Slot | Model | Role | Temp |
|---|---|---|---|
| 1 | Qwen2.5 7B local (Ollama, Q4_K_M) | Primary | 0.2 |
| 2 | Qwen2.5 7B local (Ollama, Q4_K_M) | Independent validator | 0.3 |
| 3 | Claude API | Last resort — schema failure only | — |
Slot 2 receives original_input — never slot_1 output. Independence is the point. Slot 3 never resolves slot_1 vs slot_2 conflicts. Conflicts are a human problem.
- Pass 1 (plan_analysis_initial): Facts only. Action types, args, step count. Raw term extraction for staging. No interpretation.
- Pass 2 (plan_analysis_final): Interpretation only. Consumes pass 1 output. Facts are immutable — cannot be modified.
- Debuggability guarantee: If something misfires, you know exactly which pass broke.
All three must pass before flagging:
- Lexical match against term list:
clean, clear, remove, prune, reset, purge, wipe, flush - Scope gate: write OR delete must be true (eliminates "clear explanation" false positives)
- Argument context: term must appear in file paths, args, or action descriptions — not commentary
- Irreversible action → always confirm, never scored away
- Unknown action type → always reject
- Implicit destructive → require confirmation
- Cost of asking when unnecessary < cost of acting when wrong
Trains for: direct register, epistemic honesty ("I don't know" not "as an AI I cannot"), brevity, confident tone without authority claims.
Never trains for: JSON output, tool calls, execution behavior, rule enforcement, system authority.
Dataset contract:
chosen= correct system behavior expressed in target registerrejected= incorrect/unsafe behavior OR correct behavior expressed as self-narrating AI- Primary rejection target: "as a language model I..." collapses
- Training data = prior, not fact
- Facts come from context, tools, or verified sources only
- Uncertainty stated plainly — never deflected via self-narration
- Model that confabulates confidently is worse than one that says "I don't know"
- Coordinator spawns scoped workers with context slice
- Workers are throwaway — context discarded after result returned
- Responsiveness comes from lean coordinator + worker offload, not bigger base model
Scope: input → plan → analyze → decide → respond. No redundancy. No orchestration. No ego.
Components:
- input_guard (minimal validation)
- planning_model (single slot)
- plan_analysis_initial
- plan_analysis_final
- decision_engine (2–3 hard rules only)
- response_model
Tasks:
- Define strict JSON schema for plans
- Implement prompt enforcing JSON-only output
- Build schema validator (hard fail on invalid)
- Implement plan_analysis_initial (pure mapping)
- Implement plan_analysis_final (implicit_destructive gate)
- Implement decision_engine (basic rules only)
- Wire response_model with strict "no authority" constraint
Exit criteria:
- JSON validity rate ≥ 90%
- plan_analysis is deterministic (same input = same output)
- decision_engine blocks destructive phrasing
- response_model never claims execution authority
Additions: retry logic, basic logging, input variation tests
Tasks:
- Retry on schema failure (max 2)
- Log all inputs + outputs
- Cluster failures (schema vs logic vs ambiguity)
- Expand implicit_destructive lexical set based on misses
Exit criteria: failure patterns identified, no silent failures, retry improves validity rate
Additions: slot_2, comparator, critical_fail escalation
Tasks:
- Implement slot_2 with original_input isolation
- Build comparator (intent + action_types normalization, exact match only)
- Trigger slot_2 only on write/delete/state_change
- Mismatch → return structured conflict to user, no auto-resolution
Exit criteria: slot independence confirmed, comparator not over-triggering, conflicts surface cleanly
Additions: dry_run, user_action loop, expanded decision_engine rules
Tasks:
- Implement dry_run summary (read/write/delete)
- Build user approval interface (CLI is enough)
- Add irreversible + multi-step gating rules
- Track user decisions for future calibration
Exit criteria: no execution without approval, modification loop stable, user control complete
Additions: observer service, hash chain integrity, failure tracking
Tasks:
- Log raw input, plans, decisions, outcomes
- Implement SHA-256 hash chaining
- Simulate observer failure (degraded mode)
- Buffer + replay logs
Exit criteria: full traceability, tamper detection working, degraded mode safe
Scope: worker spawning, router refinement, confidence calibration, local model migration (7B → 14B)
Spec documents (in docs/) — designed via adversarial review (Claude + GPT):
IRIS_Runtime_Contract.docx— agent event model, lifecycle, transport, hierarchy (v4.2) ✅IRIS_Agent_Structure.docx— registry, policy shape, coordinator, spawn mappings (v4.3) ✅IRIS_Policy_Profiles.docx— concrete policy profile contents per worker ✅
Remaining design sessions (3) — implementation blocked until all complete:
- Agent behavior — system prompts + tool surface per worker (one session, not two)
- Aggregation logic — coordinator combining worker results
- Failure propagation — worker → coordinator → pipeline
These were raised as architecture challenges. Answers are settled — don't relitigate them.
"Is strict comparator equality too rigid?" Acceptable. False positives cost a confirmation. False negatives cost silent wrong execution. Escalation asymmetry is deliberate.
"Does lexical detection miss semantic destructive actions?" Yes. Intentionally. Phase 1 only. Semantic layer comes after runtime data shows what it misses.
"Is slot_2 independence real given same base model?" Partially illusionary — catches surface drift, not deep model bias. Still worth having. Architecture doesn't claim more than that.
"Should decision_engine include probabilistic scoring?" Rule-based until calibration data exists. Scoring layer comes after runtime data, on top of rules, never replacing them.
"Is human escalation overused?" Probably yes in early phases. Runtime data fixes it. Phase 4 captures decisions for calibration.
"Is two-pass analysis worth the complexity?" Yes. Debuggability, not performance. Unified layer collapses the facts/interpretation distinction and makes failures opaque.
"When does added safety reduce effectiveness?" When confirmations become noise. That's a calibration problem. Phase 4 tracks decisions for this reason.
75 entries across 5 batches. All approved.
dpo_dataset_batch01-05.json— local only, not in public repodpo_calibration.md— methodology is public, data is not- Primary target register: epistemic honesty + direct response without self-narration
- Dataset needs review against the authority-leakage framing before training
| Machine | Role | Specs |
|---|---|---|
| Parrot (TUF laptop) | Runtime / inference | RTX 4060 8GB, Ryzen 7000, 32GB RAM |
| Wolfie (desktop) | Training | RTX 4080 Super 16GB, Ryzen 7800X3D, 64GB RAM |
Training: Wolfie. Unsloth + QLoRA + DPO. Export GGUF. Transfer via SCP to Parrot. Inference: Parrot. Ollama. Qwen2.5 7B at Q4_K_M (~4.5GB VRAM, 3.5GB headroom on 4060).
Both repos are on GitHub. Both are public.
iris: Phases 1–5 implementation complete. Phase 6 design in progress. Spec documents in docs/.
mnem: infrastructure, core code, methodology docs
IRIS work: Wolfie clone → push → Parrot pulls when resuming there.
- Load this file
- Read
architecture_4.0.jsonfor full layer specs - For Phase 6: read
docs/IRIS_Runtime_Contract.docx,docs/IRIS_Agent_Structure.docx,docs/IRIS_Policy_Profiles.docx— in that order - Build and test each component independently before wiring
- Do not skip exit criteria before moving phases
Last updated: April 2026 — Phase 6 design in progress Session: context-window-limited — this doc is the handoff
Exit criteria: PASS
- Valid JSON: 50/50 (100%)
- Schema conformance: 48/50 (96%) — both failures were
cannot_planstructured responses, correct behavior - Failures: "Save this" and "Update everything" — genuinely ambiguous, model correctly declined
- Vocabulary drift: 33 unknown action strings logged in
exit_criteria_p1_results.json
Phase 2 day one: classify the 33 unknown action strings into action_map.json. Do not touch implicit_destructive_terms until input variation testing surfaces actual detection misses. These are separate concerns — do not conflate them.