Geodessical

High-Performance AI Inference Runtime

Geodessical is a C-based inference runtime for GGUF language models. It can run as a normal host application (Windows/Linux) and shares core inference code with TensorOS. The focus is straightforward: predictable runtime behavior, low overhead, and transparent performance tuning.

Key Features

GGUF model loading — Qwen, LLaMA, Gemma, SmolLM, Mistral, Phi-2/3/3.5
Quantization — Q4_0, Q8_0, F16, F32 weight formats
JIT-compiled kernels — Native x86_64 SSE2/AVX2 forward pass kernels
SMP parallel GEMV — Multi-threaded matrix-vector multiply across all CPU cores
Host-mode runtime — Memory-mapped model loading, native threads, cross-platform
Bare-metal mode — Still boots as a standalone OS via Multiboot1

Performance Snapshot (Measured)

The numbers below are from an actual local run on April 13, 2026.

Test Hardware

CPU: AMD Ryzen 9 7940HS (8 cores / 16 threads)
GPU: NVIDIA GeForce RTX 4070 Laptop GPU (8 GB class; runtime reported ~7052 MB free)
RAM: 32 GB
OS: Windows (host mode)

Workload

Prompt: Write a 500-word explanation of how compilers optimize loops, in plain English.
Max generation: 256 tokens
Single run per engine/model (no averaging in this table)

Results

Engine	Model	Throughput Metric	Measured Value
Geodessical	`google_gemma-4-E2B-it-Q4_0.gguf`	End-to-end generation rate	92.5 tok/s
Geodessical	`google_gemma-4-E2B-it-Q4_0.gguf`	Decode-only rate	107.7 tok/s
Ollama	`gemma3:4b`	Eval rate (`eval_count / eval_duration`)	75.36 tok/s
Ollama	`gemma4:latest`	Eval rate (`eval_count / eval_duration`)	30.21 tok/s

Direct Comparison (same machine, same prompt length)

Geodessical end-to-end (92.5 tok/s) vs Ollama gemma3:4b (75.36 tok/s): +22.7%
Geodessical end-to-end (92.5 tok/s) vs Ollama gemma4:latest (30.21 tok/s): +206.2%

Repro Commands

Geodessical:

.\build_host\geodessical.exe "C:\Users\legom\TensorOS\models\google_gemma-4-E2B-it-Q4_0.gguf" -p "Write a 500-word explanation of how compilers optimize loops, in plain English." -n 256

Ollama (gemma3:4b):

$body = @{ model = 'gemma3:4b'; prompt = 'Write a 500-word explanation of how compilers optimize loops, in plain English.'; stream = $false; options = @{ num_predict = 256; temperature = 0.7 } } | ConvertTo-Json -Depth 6
$r = Invoke-RestMethod -Uri 'http://localhost:11434/api/generate' -Method Post -ContentType 'application/json' -Body $body
[math]::Round(($r.eval_count / ($r.eval_duration / 1e9)), 2)

Ollama (gemma4:latest):

$body = @{ model = 'gemma4:latest'; prompt = 'Write a 500-word explanation of how compilers optimize loops, in plain English.'; stream = $false; options = @{ num_predict = 256; temperature = 0.7 } } | ConvertTo-Json -Depth 6
$r = Invoke-RestMethod -Uri 'http://localhost:11434/api/generate' -Method Post -ContentType 'application/json' -Body $body
[math]::Round(($r.eval_count / ($r.eval_duration / 1e9)), 2)

Notes:

This is a practical runtime comparison, not a strict model-equivalence benchmark.
Geodessical and Ollama model packages are not byte-identical here, so use these results as operational guidance, not a canonical leaderboard.

Demo: Hosted Inference

$ ./geodessical phi3.5-mini-q4_0.gguf -p "What is an operating system?"

  Geodessical v0.4.0 "Axon"
  High-Performance AI Inference Runtime

[CPU] SSE2=1 AVX2=1 FMA=1 AVX512=0
[SMP] 8 CPUs online (7 workers + BSP)
[GD] Loading model: phi3.5-mini-q4_0.gguf
[GD] Mapped 2081 MB
[LLM] Model: Phi 3.5 Mini Instruct (phi3)
[LLM] 32 layers, 3072-dim, 32064 vocab, 32 heads
[GD] Model loaded in 1240 ms
[GD] Prompt: "What is an operating system?"

An operating system (OS) is a complex piece of software that manages...

Building

Prerequisites

Tool	Purpose	Install
`zig` (0.15+)	C compiler	ziglang.org/download

Host Mode (Windows/Linux)

# Build the hosted runtime
.\build_host.ps1

# Run with a GGUF model
.\build_host\geodessical.exe phi3.5.gguf -p "Hello world"

# Interactive chat mode
.\build_host\geodessical.exe phi3.5.gguf -i

Or with CMake (if GCC/Clang available):

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
./geodessical phi3.5.gguf -i

Bare-Metal Mode (QEMU)

# Build the bare-metal kernel + run in QEMU
.\build.ps1 -Run

# QEMU flags: -machine q35,accel=whpx -cpu EPYC-v4 -smp 4 -m 8G
#             -drive file=phi3.5.gguf,format=raw,if=virtio

Usage

Geodessical <model.gguf> [options]

Options:
  -p, --prompt <text>    Prompt text (default: interactive)
  -n, --tokens <num>     Max tokens to generate (default: 128)
  -t, --threads <num>    Thread count (default: all CPUs)
  --temp <float>         Temperature (default: 0.7)
  --top-k <int>          Top-K sampling (default: 40)
  --top-p <float>        Nucleus sampling (default: 0.9)
  -i, --interactive      Interactive chat mode
  -h, --help             Show this help

Architecture

Geodessical operates in two modes:

Host Mode (new)

┌─────────────────────────────────────────────────┐
│  geodessical.exe / Geodessical                  │
│  CLI: model load, prompt, interactive chat      │
├─────────────────────────────────────────────────┤
│  HAL (Hardware Abstraction Layer)               │
│  ┌───────────┬───────────┬──────────────────┐   │
│  │ Memory    │ Threading │ CPU Detection    │   │
│  │ malloc    │ Win32/    │ CPUID: SSE2,    │   │
│  │ aligned   │ pthreads  │ AVX2, FMA,      │   │
│  │ mmap      │ workers   │ AVX-512         │   │
│  └───────────┴───────────┴──────────────────┘   │
├─────────────────────────────────────────────────┤
│  Inference Engine (shared with bare-metal)      │
│  ┌──────┬─────────┬──────┬──────┬───────────┐   │
│  │ GGUF │ BPE     │ JIT  │ SMP  │ Forward   │   │
│  │parse │tokenize │ x86  │GEMV  │ pass      │   │
│  └──────┴─────────┴──────┴──────┴───────────┘   │
└─────────────────────────────────────────────────┘

Bare-Metal Mode (original TensorOS)

The full TensorOS kernel boots via Multiboot1, runs on x86_64/ARM64, and includes the AI shell, tensor scheduler, native git, GPU drivers, and everything documented in the TensorOS README.

Project Structure

Geodessical/
├── host/                      # Host-mode runtime (NEW)
│   ├── hal.h                  # Hardware Abstraction Layer header
│   ├── hal.c                  # Cross-platform HAL implementation
│   ├── main.c                 # CLI entry point
│   └── shims/                 # Include shims (kernel→HAL redirect)
│       └── kernel/...         # Shim headers for all kernel includes
├── runtime/
│   ├── nn/
│   │   ├── llm.c             # Full LLM inference engine
│   │   ├── llm.h             # Model types and API
│   │   └── gguf.c            # GGUF format parser
│   └── jit/
│       ├── x86_jit.c         # x86_64 JIT code emitter
│       └── llm_jit.c         # JIT forward kernels
├── kernel/                    # Bare-metal kernel (TensorOS heritage)
├── boot/                      # Bootloader (Multiboot1, ARM64)
├── build_host.ps1             # Host-mode build script (Zig CC)
├── build.ps1                  # Bare-metal build script
└── CMakeLists.txt             # CMake build (GCC/Clang)

How It Works

Model Loading: Memory-maps the GGUF file (no copy), parses metadata, maps tensor pointers directly into the file.
Tokenization: BPE tokenizer built from GGUF vocabulary with an O(1) hash table lookup and merge-based encoding.
Forward Pass: Full transformer forward pass with RMSNorm → QKV projection → RoPE → GQA attention → SwiGLU FFN → LM head.
JIT Compilation: On first inference, six x86_64 SIMD kernels are JIT-compiled (vadd, dot, axpy, fused_silu_mul, rope, rmsnorm) — eliminating per-element function call overhead.
SMP Dispatch: Matrix-vector multiplies are partitioned across all CPU cores via the HAL's thread pool.
Sampling: Temperature-scaled softmax with top-k/top-p nucleus sampling and optional greedy decoding.

Supported Models

Current GGUF coverage in this runtime includes:

Model	Architecture	Tested
Gemma 4 E2B It	gemma4	✅
Phi-3.5 Mini Instruct	phi3	✅
Qwen2.5	qwen2	✅
LLaMA 3	llama	✅
Gemma 2	gemma	✅
SmolLM 2	llama	✅
Mistral	llama	✅
Phi-2	phi2	✅

Quantization: Q4_0, Q8_0, F16, F32

Origin

Geodessical evolved from TensorOS, a bare-metal AI operating system. The core inference engine, GGUF parser, BPE tokenizer, JIT compiler, and SMP parallel GEMV are shared between both projects. Geodessical adds the HAL layer to run the same inference code as a native application on Windows and Linux.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
boot		boot
docs		docs
host		host
kernel		kernel
pkg		pkg
runtime		runtime
scripts		scripts
tests		tests
tools		tools
userland		userland
virt		virt
.gdbinit		.gdbinit
.gitignore		.gitignore
AUDIT_REPORT.md		AUDIT_REPORT.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
Install-Geod.ps1		Install-Geod.ps1
LICENSE		LICENSE
Makefile		Makefile
Modelfile		Modelfile
README.md		README.md
axiom_beta3_report.json		axiom_beta3_report.json
axiom_beta4_probe256_report.json		axiom_beta4_probe256_report.json
axiom_beta4_proto_report.json		axiom_beta4_proto_report.json
axiom_warp_state.dat		axiom_warp_state.dat
bench_axiom_fast_cpu.json		bench_axiom_fast_cpu.json
bench_axiom_fast_gpu.json		bench_axiom_fast_gpu.json
bench_axiom_fast_gpu_final.json		bench_axiom_fast_gpu_final.json
bench_axiom_fast_gpu_itercap.json		bench_axiom_fast_gpu_itercap.json
bench_axiom_fast_gpu_itercap2.json		bench_axiom_fast_gpu_itercap2.json
bench_axiom_fast_gpu_itercap2_rerun.json		bench_axiom_fast_gpu_itercap2_rerun.json
bench_axiom_fast_gpu_itercap3.json		bench_axiom_fast_gpu_itercap3.json
bench_axiom_fast_gpu_itercap3_rerun.json		bench_axiom_fast_gpu_itercap3_rerun.json
bench_axiom_fast_gpu_itercap4_fastguard.json		bench_axiom_fast_gpu_itercap4_fastguard.json
bench_axiom_fast_gpu_itercap4_injection.json		bench_axiom_fast_gpu_itercap4_injection.json
bench_axiom_fast_gpu_itercap4_injection_rerun.json		bench_axiom_fast_gpu_itercap4_injection_rerun.json
bench_axiom_fast_gpu_oracle_targets.json		bench_axiom_fast_gpu_oracle_targets.json
bench_axiom_fast_gpu_phase4cpu.json		bench_axiom_fast_gpu_phase4cpu.json
bench_axiom_fast_gpu_tuned.json		bench_axiom_fast_gpu_tuned.json
bench_axiom_full_gpu.json		bench_axiom_full_gpu.json
bench_axiom_full_gpu_itercap5_persist.json		bench_axiom_full_gpu_itercap5_persist.json
bench_axiom_full_gpu_itercap5_recalc.json		bench_axiom_full_gpu_itercap5_recalc.json
bench_axiom_full_gpu_itercap6_orchestrated.json		bench_axiom_full_gpu_itercap6_orchestrated.json
bench_axiom_full_gpu_itercap6_orchestrated_fix.json		bench_axiom_full_gpu_itercap6_orchestrated_fix.json
bench_axiom_gpu_off.json		bench_axiom_gpu_off.json
bench_axiom_gpu_off_reuse.json		bench_axiom_gpu_off_reuse.json
bench_axiom_gpu_on.json		bench_axiom_gpu_on.json
bench_axiom_gpu_on_reuse.json		bench_axiom_gpu_on_reuse.json
bench_axiom_s128_p512.json		bench_axiom_s128_p512.json
bench_axiom_s256_p1024.json		bench_axiom_s256_p1024.json
bench_axiom_s256_p1024_stable.json		bench_axiom_s256_p1024_stable.json
bench_axiom_s64_p256.json		bench_axiom_s64_p256.json
benchmark_peak.md		benchmark_peak.md
benchmark_peak.ps1		benchmark_peak.ps1
benchmark_results.csv		benchmark_results.csv
benchmark_results.md		benchmark_results.md
benchmark_results_raw.csv		benchmark_results_raw.csv
benchmark_runner.ps1		benchmark_runner.ps1
benchmark_three_way.ps1		benchmark_three_way.ps1
build.ps1		build.ps1
build_host.ps1		build_host.ps1
build_output.txt		build_output.txt
build_rpi.ps1		build_rpi.ps1
check_gguf_tensors.py		check_gguf_tensors.py
check_q4_layout.py		check_q4_layout.py
check_sector1.ps1		check_sector1.ps1
diag_out.txt		diag_out.txt
gemma4_2b.modelfile		gemma4_2b.modelfile
geod.cmd		geod.cmd
geod.ps1		geod.ps1
ott_hs_disk.dat		ott_hs_disk.dat
ott_readiness_report.json		ott_readiness_report.json
postprocess.ps1		postprocess.ps1
push_rpi.ps1		push_rpi.ps1
qemu_arm64.ps1		qemu_arm64.ps1
qemu_out.txt		qemu_out.txt
qemu_out2.txt		qemu_out2.txt
qemu_serial0.log		qemu_serial0.log
qemu_serial1.log		qemu_serial1.log
qemu_sne.txt		qemu_sne.txt
verify_l0_v2.py		verify_l0_v2.py
verify_qkv_compare.py		verify_qkv_compare.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geodessical

Key Features

Performance Snapshot (Measured)

Test Hardware

Workload

Results

Direct Comparison (same machine, same prompt length)

Repro Commands

Demo: Hosted Inference

Building

Prerequisites

Host Mode (Windows/Linux)

Bare-Metal Mode (QEMU)

Usage

Architecture

Host Mode (new)

Bare-Metal Mode (original TensorOS)

Project Structure

How It Works

Supported Models

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Geodessical

Key Features

Performance Snapshot (Measured)

Test Hardware

Workload

Results

Direct Comparison (same machine, same prompt length)

Repro Commands

Demo: Hosted Inference

Building

Prerequisites

Host Mode (Windows/Linux)

Bare-Metal Mode (QEMU)

Usage

Architecture

Host Mode (new)

Bare-Metal Mode (original TensorOS)

Project Structure

How It Works

Supported Models

Origin

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages