A self-hosted systems + scientific programming language for epistemic computing, uncertainty propagation, and algebraic effects
Manifesto · Examples · Status · Contributing
Sounio is a systems programming language for epistemic computing — its type system tracks not just what your data is, but how much you should trust it. Uncertainty propagation, provenance tracking, and confidence-gated execution are built into the type system, not bolted on as libraries.
Keywords: systems programming language, scientific computing language, epistemic types, uncertainty propagation, algebraic effects, self-hosted compiler, formal verification, non-associative algebra, octonions, e-graphs.
The compiler is self-hosted: Sounio compiles itself, bootstrapped from a 2000-line C compiler through a multi-stage chain to a true fixed-point where stage N and stage N+1 produce bit-identical binaries. It was used to computationally verify a new result in algebra — that the count of nonzero octonion basis associators equals |PSL(2,7)| = 168 — now submitted for publication.
This is an active research project, not a production release. Read the honest status before using it for anything serious.
The canonical Sounio checkout now includes a bounded cross-repo example under:
examples/cognitive_ossm/
This lane is paired with the repository:
github.com/agourakis82/hyperbolic-semantic-networks
Workflow split:
- Sounio provides the executable parity path and canonical
.sioimplementation scaffolding. - The hyperbolic repo exports the compact SWOW bundle in
data/cpc2026/sounio_input/. - The hyperbolic repo's Python mirror currently generates the full paper-scale O-SSM artifacts.
From the Sounio repo root:
./artifacts/omega/souc-bin/souc-linux-x86_64-gpu run examples/cognitive_ossm/cognitive_ossm.sio
./artifacts/omega/souc-bin/souc-linux-x86_64-gpu run examples/cognitive_ossm/run_regimes.sio -- --max-trajectories 8 --max-steps 64
./artifacts/omega/souc-bin/souc-linux-x86_64-gpu run examples/cognitive_ossm/export_results.sio- Session bootstrap:
- Read CLAUDE_HANDOFF.md
- Read CLAUDE.md
- Read AGENTS.md
- Verify the current branch before editing
- Treat
/workspace/sounioas the active remote-first workspace path - Do not propose destructive reset/clean/rebase flows to "simplify" recovery state
- Prompt surface: llms.txt
- Repository guide: CLAUDE.md
- Syntax and workflow guide: docs/LLM_PROGRAMMING_GUIDE.md
- Live Hugging Face dataset: https://huggingface.co/datasets/chiuratto-AIgourakis/sounio-code-examples
- Training dataset export: datasets/sounio-code-examples/README.md
- Dataset builder: scripts/export_hf_dataset.py
This repo now ships a root llms.txt for model-aware tools and a reproducible Hugging Face-style dataset export built from the Sounio test suite.
The current published dataset lives in the maintainer namespace as a public mirror until the sounio-lang Hugging Face org namespace is ready.
Epistemic types as first-class citizens. Every scientific measurement has uncertainty. Most languages ignore this. Sounio's type system includes Knowledge[T] with built-in confidence, provenance tracking, and automatic GUM-compliant uncertainty propagation. The compiler can enforce confidence thresholds at compile time — a function requiring ε >= 0.82 rejects under-confident data before any code runs. No equivalent system exists in any production language.
Self-hosted compiler. The compiler bootstrapped from C through a multi-stage chain (stage0.c → boot2g.sio → self-hosted) to a true fixed-point. The default workflow is now native-only: bin/souc compiles .sio sources to temporary or named ELFs via the self-hosted compiler and executes those binaries directly.
Not a Rust/Julia dialect. Own syntax (&! not &mut, var not let mut), own semantics (algebraic effects, linear types, dimensional analysis), own philosophy (epistemic computing for science).
fn main() with IO {
// A drug dose with tracked confidence and evidence source
let base_dose: Knowledge[f64] = Knowledge(15.0, ε=0.92, prov="ASHP_2020_Level1A_RCT")
// Hospital scale measurement: high-confidence device
let weight: Knowledge[f64] = Knowledge(78.5, ε=0.98, prov="hospital_scale_calibrated")
let ref_wt: Knowledge[f64] = Knowledge(70.0, ε=1.0)
// GUM propagation is automatic: ε(a*b) = ε(a) * ε(b)
let adjusted_dose: Knowledge[f64] = base_dose * (weight / ref_wt)
// Extract propagated confidence
let conf = adjusted_dose.ε // ~0.90
println(conf)
}
Full pipeline: tests/run-pass/vancomycin_propagation.sio — real ASHP 2020 vancomycin dosing with 5-step GUM propagation.
// ASHP 2020 §8.3: AUC-guided dosing requires ε >= 0.82
fn prescribe_vancomycin(dose: Knowledge[f64, ε >= 0.82]) with IO {
println("Vancomycin prescribed")
}
fn main() with IO {
let risky_dose: Knowledge[f64, ε=0.40] = Knowledge { value: 500.0, epsilon: 0.40 }
prescribe_vancomycin(risky_dose) // COMPILE ERROR: ε=0.40 < required 0.82
}
The compiler rejects this before any code runs — a hard patient-safety guarantee. See: tests/compile-fail/vancomycin_low_conf.sio
fn sqrt_approx(x: f64) -> f64 with Mut, Div, Panic {
if x <= 0.0 { return 0.0 }
var g = x / 2.0
var i = 0
while i < 50 {
g = (g + x / g) / 2.0
i = i + 1
}
return g
}
linear struct FileHandle { fd: i32 } // must be consumed exactly once
More examples: examples/epistemic_bmi.sio, docs/guide/SOUNIO_QUICK_START.md
This is an active research repository. Here's what actually works and what doesn't.
| Component | Status | Evidence |
|---|---|---|
| Epistemic core | Knowledge[T] + GUM propagation + provenance |
52 files, tested, dissertation-grade |
| Self-hosted compiler | Lexer, parser, checker, codegen — compiles itself | Fixed-point verified (stage2 == stage3) |
| Algebra | Clifford Cl(p,q), Cayley-Dickson CD(k), Jordan J₃(O), octonions | Verified the 168 theorem |
| Ontology | OWL2 model + reasoner + query engine | 40 tests passing |
| Native codegen | x86-64 ELF emission from self-hosted lean driver | Bit-identical bootstrap chain |
| Core stdlib | Stats, linalg, ODE solvers, signal processing, CSV, JSON | Gate: 81 pass / 0 fail / 5 skip |
| Optimizer | 1000+ e-graph rewrite rules, GVN, LICM, load sinking | 1003 tests, all FAIL=0 |
| Component | Reality |
|---|---|
| Theorem prover | 9,600 lines — but NO inference logic, just arena + data structures |
| ~70% of epistemic modules | Function signatures with minimal bodies |
| Neural networks (quaternion/octonion) | Compilation errors, won't run |
| Genomics | 11 files are single-line stubs (disabled on parser limitations) |
| Async runtime | 12 files, mostly <10 lines each |
| Geometry engine | 100% disabled |
| Gap | Detail |
|---|---|
Knowledge<T> is NOT generic |
Hardcoded as Epistemic struct (f64 only). Struct-level generics in progress. |
| Epistemic ODE solver | Only does exponential decay, not general RHS (needed for PBPK) |
| Ontology federation | Has 8 hardcoded CURIEs, NOT 15M terms — federation is a stub |
| GPU entry point | gpu/lib.sio is empty. PTX codegen exists but no end-to-end path. |
| Closure literals | |x| x + 1 not supported. Named fn refs work (let f = square). |
| Windows / macOS | Linux x86-64 only. macOS Mach-O backend exists but untested. |
| Category | Files | Percentage |
|---|---|---|
| Complete (working, tested) | 402 | 57% |
| Partial (some functions work) | 175 | 25% |
| Skeleton (types only, no logic) | 95 | 13% |
| Stub (1-line placeholder) | 38 | 5% |
| Total | 710 |
While developing Sounio's octonion multiplication backend, we discovered and proved a combinatorial fact that appears not to have been explicitly stated in the literature:
The number of ordered triples (i, j, k) in {1,...,7}^3 for which the octonion basis associator [e_i, e_j, e_k] is nonzero is exactly 168 = |PSL(2,7)|.
The decomposition is 343 = 133 (repeated indices) + 42 (Fano-line triples) + 168 (non-collinear triples). We also report that sedenion nonzero associator counts are multiples of 168, and that the primitive zero-divisor pair count 336 = 2 x 168.
The result was verified computationally in Sounio and independently reproduced in Python/NumPy.
Paper: "The 168 Theorem: PSL(2,7) Governs Non-Associativity and Zero-Divisor Structure in the Cayley-Dickson Tower" — Agourakis & Gerenutti (2026). Submitted to Advances in Applied Clifford Algebras.
The repo ships a pre-built Linux x86-64 self-hosted compiler artifact plus a native wrapper. No Rust build step is required for the default workflow.
git clone https://github.com/sounio-lang/sounio.git
cd sounio
export SOUC="$(pwd)/bin/souc"
export SOUNIO_STDLIB_PATH="$(pwd)/stdlib"
$SOUC --version # souc native-wrapper v1.0.0-rc1
$SOUC check examples/hello.sio # type-check
$SOUC run examples/epistemic_bmi.sio # compile + execute
$SOUC compile examples/hello.sio -o hello.elf
$SOUC repl # not yet supported in native modeFor detailed setup: INSTALL.md · docs/guide/MINIMUM_VIABLE_SOUNIO.md
Pipeline: Source → Lexer → Parser → AST → Check → HIR → SIR → HLIR (SSA) → Codegen
| Directory | Purpose |
|---|---|
self-hosted/lexer/, parser/ |
Frontend (tokenizer, recursive descent) |
self-hosted/check/, types/ |
Bidirectional type inference + algebraic effects |
self-hosted/ir/ |
IR lowering, optimization, e-graph equality saturation |
self-hosted/native/ |
x86-64 ELF emission |
self-hosted/compiler/ |
Codegen drivers (lean, IR) |
stdlib/epistemic/ |
Knowledge[T], uncertainty (GUM), provenance |
stdlib/units/ |
Dimensional analysis |
bootstrap/ |
stage0 (C) → boot2g → self-hosted chain |
formal/ |
Lean 4 proofs (epistemic type invariants) |
tests/ |
run-pass/, compile-fail/, ui/, stdlib/ |
- Uncertainty is not optional — Every scientific value has uncertainty. Ignoring it is a bug, not a simplification.
- Provenance matters — Data without origin is data without trust.
- Propagation is automatic — Manual uncertainty calculation is error-prone. The compiler handles it (GUM/ISO 17025).
- Confidence gates execution — Low-confidence code paths require explicit acknowledgment.
- One type definition, compiler guarantees everything — Define your epistemic constraints once; the compiler enforces them across all operations.
See docs/MANIFESTO.md for the full philosophy.
Platform. Pre-built binaries are Linux x86-64 only. macOS Mach-O backend exists but is not regularly tested. Windows is not supported.
Native startup cost. bin/souc run performs a native compilation step before execution, so there is a small startup cost compared with an in-process executor.
No struct generics (yet). Knowledge<T> is currently monomorphic (f64 only). Struct-level generics are the highest-priority language feature. Function-level generics work.
No closure literals. Named function references work (let f = square), but |x| x + 1 lambda syntax is not supported.
No REPL/debug flags yet. Native mode does not yet support repl, --show-ast, or --show-types.
FFI. extern "C" remains limited in scope, but the old JIT-only integer FFI failure mode is gone on the native path.
GPU. PTX codegen exists in self-hosted/gpu/ but there is no end-to-end compilation path from the CLI. SPIR-V/Metal/WGSL files exist as stubs.
Full list: docs/compiler/KNOWN_LIMITATIONS.md
If you use Sounio in academic work:
@software{sounio2026,
title = {Sounio: A Systems Programming Language for Epistemic Computing},
author = {Agourakis, Demetrios Chiuratto and Gerenutti, Marli},
year = {2026},
version = {1.0.0-beta.6},
doi = {10.5281/zenodo.18726647},
url = {https://github.com/sounio-lang/sounio},
note = {Self-hosted compiler with epistemic types and Lean 4 verification}
}Apache-2.0. See LICENSE.
At the horizon of certainty, where ancient columns meet the endless sea.
SOUNIO