Philosophy, Origins, and Application
Author: Olivier Vitrac, PhD, HDR | [email protected] | Adservio Version: 2.0 Date: 2026-02-13 RAGIX Version: 0.66+
- Introduction
- Origins: Scientific Kernels
- From Virtual Labs to Code Audits
- KOAS Architecture
- Kernel Philosophy
- The Three-Stage Pipeline
- Kernel Families
- Orchestration and Execution
- Activity Logging and Audit Trail
- RAG Integration
- Quality Standards
- References
KOAS (Kernel-Orchestrated Audit System) is a sovereign, local-first computation framework that applies scientific kernel principles to software audits, document analysis, slide generation, Markdown review, and security scanning. As of v0.66, KOAS comprises 75 deterministic kernels across 5 families, each following the three-stage pipeline pattern (data collection β analysis β reporting).
- Sovereignty: All processing happens locally β no cloud dependencies
- Reproducibility: Deterministic kernels produce identical outputs for identical inputs
- Auditability: Complete execution trail with cryptographic verification
- Composability: Modular kernels with explicit dependencies
- LLM-Ready: Structured summaries designed for AI-assisted interpretation
KOAS inherits its architecture from the Generative Simulation (GS) initiative, which developed a pattern for augmenting AI agents with specialized scientific computation.
In the GS framework, scientific agents are not raw LLMs. Instead, they embed one or more LLMs that operate scientific kernels β specialized software libraries designed for machine-to-machine interaction rather than human use.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SCIENTIFIC AGENT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Encoder βββββΆβ Kernel βββββΆβ Decoder β β
β β (LLM) β β(Computation)β β (LLM) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββ β
β β Structured β β
β β Output β β
β βββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key insight from GS:
"Each interaction costs a few seconds; most time is spent inside the scientific kernels, not the LLMs."
Scientific kernels include:
- FEniCS, OpenFOAM for fluid dynamics
- LAMMPS, Gromacs for molecular simulation
- SFPPy for mass transfer and migration modeling
- Pizza3 for soft-matter mechanics
These kernels are narrow but powerful: they respond to one supervisor and produce strongly contextualized outputs.
In the GS pattern, RAG (Retrieval-Augmented Generation) serves two key roles:
- Upstream (Encoder): Encode and contextualize user requests into domain-aware instructions aligned with kernel semantics
- Downstream (Decoder): Decode and frame kernel outputs into interpretable, structured insights for the LLM
This bidirectional RAG loop ensures precision, traceability, and scientific consistency across reasoning steps.
Above individual agents sits an orchestrator agent that coordinates multiple scientific agents asynchronously:
- Scheduling and dependency management
- Progress monitoring and diagnostics
- Convergence by majority voting or weighted consensus
The orchestrator's kernel typically includes logic or numerical solvers, while an LLM with large context handles summarization and dashboards.
The GS scientific agent pattern maps directly to code audit systems:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VIRTUAL HYBRID LAB β CODE AUDIT LAB β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Scientific Kernels β Audit Kernels β
β (FEniCS, LAMMPS, SFPPy, Pizza3) (AST scan, metrics, coupling) β
β β
β Scientific Agent β Audit Agent β
β (narrow, operates one kernel) (operates one audit kernel) β
β β
β Orchestrator Agent β Audit Orchestrator β
β (coordinates, schedules, diagnoses) (stages, dependencies, flow) β
β β
β Discovery Agent β Audit Discovery β
β ("Interpret these experiments") ("What are the risks here?") β
β β
β Knowledge Base (RAG) β Audit KB (RAG) β
β (regulatory data, domain knowledge) (patterns, standards, history) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Kernels do the heavy computation (AST parsing, metrics, graph analysis)
- LLM orchestrates and interprets (what to run next, what does this mean)
- RAG provides context (patterns, standards, previous audits)
This separation ensures:
- Reproducible, deterministic analysis
- Explainable results with full audit trail
- Efficient use of LLM resources
ββββββββββββββββββββββββββββββββββββββββ
β DISCOVERY AGENT β
β "What technical debt should we β
β prioritize? What are the risks?" β
β β
β LLM: Claude/GPT-4 (large context) β
ββββββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββΌββββββββββββββββββββ
β ORCHESTRATOR AGENT β
β Coordinates audit workflow β
β Handles dependencies & scheduling β
β β
β Engine: Kernel Orchestrator β
ββββββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ
β METRICS KERNEL β β STRUCTURE β β QUALITY KERNEL β
β β β KERNEL β β β
β β’ ast_scan β β β’ partition β β β’ hotspots β
β β’ metrics β β β’ dependency β β β’ dead_code β
β β’ coupling β β β’ services β β β’ risk β
βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ
β β β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β
βββββββββββββββββββΌβββββββββββββββββββ
β AUDIT KNOWLEDGE BASE β
β β
β stage1/ β Raw kernel outputs β
β stage2/ β Analysis summaries β
β stage3/ β Report sections β
β logs/ β Full audit trail β
ββββββββββββββββββββββββββββββββββββββ
A kernel is a pure computation unit that:
- Takes structured input (workspace path, configuration, dependencies)
- Produces JSON output + human-readable summary
- Is fully deterministic (no randomness, no external network calls)
- Contains no LLM logic (pure computation)
class Kernel:
name: str # Unique identifier (e.g., "ast_scan")
version: str # Semantic version
stage: int # Pipeline stage (1, 2, or 3)
category: str # Functional category
requires: List[str] # Dependencies (other kernel outputs)
provides: List[str] # Capabilities provided
def compute(input: KernelInput) -> Dict[str, Any]:
"""Execute kernel computation. Pure function."""
...
def summarize(data: Dict) -> str:
"""Generate LLM-consumable summary (<500 chars)."""
...Every kernel produces:
@dataclass
class KernelOutput:
success: bool # Execution status
data: Dict[str, Any] # Full structured data
summary: str # LLM-ready summary (<500 chars)
output_file: Path # Persisted JSON location
dependencies_used: List # Traceability- Separation of Concerns: Computation is isolated from orchestration
- Testability: Kernels can be tested in isolation
- Parallelism: Independent kernels run concurrently
- Reproducibility: No hidden state or external dependencies
- Auditability: Clear inputs, outputs, and execution trace
KOAS organizes computation into three stages, each building on the previous:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KOAS Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β STAGE 1 β β STAGE 2 β β STAGE 3 β β
β β Data Collection βββΆβ Analysis βββΆβ Reporting β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β
β β’ AST Scanning β’ Statistics β’ Executive Summary β
β β’ Metrics β’ Hotspots β’ Overview β
β β’ Dependencies β’ Dead Code β’ Risk Assessment β
β β’ Partitioning β’ Coupling β’ Drift Analysis β
β β’ Services β’ Entropy β’ Recommendations β
β β’ Timeline β’ Risk β’ Report Assembly β
β β’ Drift β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Extracts raw information from source code:
| Kernel | Purpose | Output |
|---|---|---|
ast_scan |
Parse source files, extract symbols | Classes, methods, fields |
metrics |
Calculate code metrics | CC, LOC, MI per file |
dependency |
Build dependency graph | Import relationships |
partition |
Identify logical boundaries | Component clusters |
services |
Detect architectural patterns | Controllers, services, repos |
timeline |
Build lifecycle profiles | Modification history |
Processes Stage 1 data to derive insights:
| Kernel | Purpose | Output |
|---|---|---|
stats_summary |
Distributional statistics | Mean, median, outliers |
hotspots |
Identify high-risk areas | Complexity/size hotspots |
dead_code |
Detect unused code | Unreachable elements |
coupling |
Martin coupling metrics | Ca, Ce, I, A, D |
entropy |
Information-theoretic analysis | Token/symbol entropy |
risk |
MCO risk assessment | Risk levels per component |
drift |
Spec-code alignment | Synchronization status |
Generates human-readable documentation:
| Kernel | Purpose | Output |
|---|---|---|
section_executive |
Executive summary | Key findings |
section_overview |
Codebase metrics | Quality grades |
section_drift |
Drift analysis | Alignment tables |
section_recommendations |
Action items | Prioritized list |
report_assemble |
Final assembly | Complete markdown |
As of v0.66, KOAS organizes 75 kernels into 5 families. Each family follows the three-stage pipeline and has its own dedicated documentation.
| Family | Scope | Kernels | Documentation |
|---|---|---|---|
| audit | Java codebase analysis (AST, metrics, coupling, risk) | 27 | This document (Β§6) |
| docs | Document summarization (hierarchical, dual clustering) | 17 | KOAS_DOCS.md |
| presenter | Slide deck generation (MARP, 3 compression modes) | 8 | KOAS_PRESENTER.md |
| reviewer | Traceable Markdown review (chunk edits, selective revert) | 13 | KOAS_REVIEW.md |
| security | Vulnerability scanning and dependency analysis | 10 | Β§6 / KOAS_MCP_REFERENCE.md |
All families share the same execution model:
ββββββββββββββββββββββββββββββββββββββββββ
β KOAS Orchestrator β
β (dependency resolution, scheduling) β
ββββββββββββββββββββ¬ββββββββββββββββββββββ
β
ββββββββββββββ¬βββββββββββββββΌβββββββββββββ¬βββββββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β audit β β docs β βpresenter β β reviewer β β security β
β 27 K β β 17 K β β 8 K β β 13 K β β 10 K β
β β β β β β β β β β
β AST, β β Pyramidalβ β MARP, β β Chunk β β CVE, β
β metrics, β β + Leiden β β layout, β β edits, β β deps, β
β coupling β β clusters β β compress β β ledger β β secrets β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
- Kernels compute, LLMs reason β No LLM logic inside kernels; LLMs are invoked at orchestration boundaries (Worker + Tutor pattern in
docsandreviewerfamilies) - Three-stage pipeline β Stage 1 (data collection), Stage 2 (analysis), Stage 3 (reporting)
- Deterministic by default β Same input β same output; optional LLM normalization is clearly marked
- Sovereignty attestation β Every kernel execution logged with
sovereignty.local_only: true - Hash chain integrity β SHA256 chain across executions for tamper evidence
The original KOAS family, described in detail in Β§6 above. Covers Java codebase analysis: AST scanning, complexity metrics, coupling analysis (Martin), dead code detection, risk assessment, and report assembly.
Hierarchical document summarization with dual clustering (Pyramidal + Leiden). Uses the Worker + Tutor LLM pattern: a small model (e.g., Granite 3B) generates summaries, a larger model (e.g., Mistral 7B) refines them.
See KOAS_DOCS.md for the full 17-kernel pipeline, provenance tracking (Merkle root), and LLM cache integration.
Generates MARP-compatible slide decks from document analysis. Three compression modes (full/compressed/executive), layout intelligence, and deterministic slide generation with optional LLM normalization.
See KOAS_PRESENTER.md for presenterctl CLI, SlideDeck JSON schema, and production benchmarks.
Traceable Markdown review with chunk-level edits, selective revert, and preflight pipeline. Uses an append-only ledger with RVW-NNNN change IDs for full traceability.
See KOAS_REVIEW.md for the review pipeline, change tracking, and acceptance/revert workflow.
Vulnerability scanning, dependency analysis, secret detection, and compliance checks. Operates on the same three-stage pipeline as other families.
See KOAS_MCP_REFERENCE.md for MCP tool interfaces.
Kernels declare explicit dependencies. The orchestrator:
- Builds a dependency graph from
requiresdeclarations - Performs topological sort
- Identifies independent kernels (same topological level)
- Executes in batches respecting dependencies
Stage 1 Execution:
Batch 1: ast_scan (no dependencies)
Batch 2: metrics, dependency, timeline, services (depend on ast_scan)
Batch 3: partition (depends on dependency)
Stage 2 Execution:
All 7 kernels run in parallel (independent)
Stage 3 Execution:
Batch 1: section_executive, section_overview, section_drift, section_recommendations
Batch 2: report_assemble (depends on all sections)
Independent kernels execute concurrently using thread pools:
# Parallel execution with dependency awareness
koas_run(workspace, parallel=True, workers=4)Performance characteristics:
- 60K LOC Java project: ~3.4 seconds total
- Stage 1: ~2.1s (I/O bound - file parsing)
- Stage 2: ~0.5s (CPU bound - analysis)
- Stage 3: ~0.02s (sequential for consistency)
Every execution is logged with:
- Timestamp and kernel name
- Input hash (SHA256 of configuration)
- Execution duration
- Success/failure status
- Output hash (SHA256 of results)
- Chain hash linking to previous entry
This creates a blockchain-style integrity chain that can be verified at any time.
Every kernel execution is recorded in a centralized, append-only event stream (.KOAS/activity/events.jsonl), providing a complete audit trail for all KOAS operations.
Each event captures:
- Who: Actor identity (system, operator, external orchestrator, auditor)
- What: Kernel name, version, stage, scope
- When: ISO 8601 timestamp with milliseconds
- Result: Success/failure, duration, item count, cache hit/miss
- Sovereignty:
local_only: trueattestation per event
The orchestrator maintains a SHA256 chain across kernel executions. Each entry's hash incorporates the previous entry, creating a tamper-evident chain. Additionally, the Merkle tree module computes inputs_merkle_root for document-level provenance.
When external orchestrators (Claude, GPT-4) access KOAS through the broker gateway, activity logging captures:
- Authentication events (
system.authscope) - Actor type:
external_orchestratorwithapi_keyorhmacauth - Scope enforcement: external clients see only
docs.statusanddocs.export_external
See KOAS_ACTIVITY.md for the complete event schema, actor model, querying examples, and configuration reference.
Source: ragix_kernels/activity.py (731 lines)
KOAS integrates with vector-based retrieval for semantic queries:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Source Code βββΆ Chunker βββΆ Embeddings βββΆ Vector Index β
β β
β Query βββΆ Embedding βββΆ Similarity Search βββΆ Context β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Components:
- ChromaDB: Vector store for embeddings
- Sentence Transformers: Embedding models (all-MiniLM-L6-v2)
- BM25: Sparse keyword search
- Hybrid Fusion: RRF, weighted, or interleave strategies
When a user asks: "What are the risks in this codebase?"
RAG retrieves relevant context:
- Previous audit findings for similar projects
- Industry standards and thresholds
- Component patterns and anti-patterns
This context enriches the orchestrator's decision-making.
After kernel execution, RAG helps interpret results:
- Maps metrics to qualitative assessments
- Retrieves remediation patterns
- Generates contextualized recommendations
Cyclomatic Complexity thresholds:
| Level | CC Range | Action |
|---|---|---|
| Low | 1-10 | Normal maintenance |
| Moderate | 11-20 | Review recommended |
| High | 21-50 | Refactoring advised |
| Very High | >50 | Critical attention |
Based on the SEI formula:
MI = 171 - 5.2 Γ ln(HV) - 0.23 Γ CC - 16.2 Γ ln(LOC)
Normalized to 0-100 scale:
| Grade | MI Range | Assessment |
|---|---|---|
| A | 80-100 | Excellent |
| B | 60-79 | Good |
| C | 40-59 | Moderate |
| D | 20-39 | Poor |
| F | 0-19 | Critical |
Robert C. Martin's package coupling metrics:
| Metric | Formula | Interpretation |
|---|---|---|
| Ca | Afferent coupling | Incoming dependencies |
| Ce | Efferent coupling | Outgoing dependencies |
| I | Ce / (Ca + Ce) | Instability (0=stable, 1=unstable) |
| A | Abstract / Total | Abstractness ratio |
| D | |A + I - 1| | Distance from main sequence |
Following the SQALE method:
Debt (hours) = Ξ£ (violation_count Γ remediation_time)
Where remediation_time is based on industry benchmarks for each violation type.
-
McCabe, T.J. (1976). "A Complexity Measure". IEEE Transactions on Software Engineering, SE-2(4), 308-320.
-
Martin, R.C. (2003). "Agile Software Development: Principles, Patterns, and Practices". Pearson Education.
-
Halstead, M.H. (1977). "Elements of Software Science". Elsevier North-Holland.
-
SQALE Method β Software Quality Assessment based on Lifecycle Expectations. SQALE Consortium.
- Vitrac, O. (2025). "Virtual/Hybrid R&D Laboratories built with Augmented-AI Agents". LinkedIn Publication / Generative Simulation Initiative.
-
ISO/IEC 25010:2011 β Systems and software engineering β Systems and software Quality Requirements and Evaluation (SQuaRE).
-
IEEE 1061-1998 β Standard for a Software Quality Metrics Methodology.
"Science is no longer what the model can explain, but what a group of agents can coordinate." β Generative Simulation Initiative
KOAS embodies this philosophy:
- Kernels are narrow but powerful β Each does one thing well
- Orchestration enables emergence β Complex insights from simple components
- RAG bridges language and computation β Context-aware encoding/decoding
- Sovereignty ensures trust β All data stays local
- Reproducibility enables auditability β Same input, same output, always
This architecture enables industrial-scale code analysis while maintaining scientific rigor, explainability, and full control over sensitive codebases.
KOAS is part of the RAGIX project β Retrieval-Augmented Generative Interactive eXecution Agent
Adservio Innovation Lab | 2025β2026
Document Version: 2.0 Last Updated: 2026-02-13 Author: Olivier Vitrac, PhD, HDR | [email protected] | Adservio