Skip to content

NagusameCS/HyperTensor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geodessical

High-Performance AI Inference Runtime

Language Architecture Build Mode LLM Working Last Commit

Geodessical is a C-based inference runtime for GGUF language models. It can run as a normal host application (Windows/Linux) and shares core inference code with TensorOS. The focus is straightforward: predictable runtime behavior, low overhead, and transparent performance tuning.

Key Features

  • GGUF model loading — Qwen, LLaMA, Gemma, SmolLM, Mistral, Phi-2/3/3.5
  • Quantization — Q4_0, Q8_0, F16, F32 weight formats
  • JIT-compiled kernels — Native x86_64 SSE2/AVX2 forward pass kernels
  • SMP parallel GEMV — Multi-threaded matrix-vector multiply across all CPU cores
  • Host-mode runtime — Memory-mapped model loading, native threads, cross-platform
  • Bare-metal mode — Still boots as a standalone OS via Multiboot1

Performance Snapshot (Measured)

The numbers below are from an actual local run on April 13, 2026.

Test Hardware

  • CPU: AMD Ryzen 9 7940HS (8 cores / 16 threads)
  • GPU: NVIDIA GeForce RTX 4070 Laptop GPU (8 GB class; runtime reported ~7052 MB free)
  • RAM: 32 GB
  • OS: Windows (host mode)

Workload

  • Prompt: Write a 500-word explanation of how compilers optimize loops, in plain English.
  • Max generation: 256 tokens
  • Single run per engine/model (no averaging in this table)

Results

Engine Model Throughput Metric Measured Value
Geodessical google_gemma-4-E2B-it-Q4_0.gguf End-to-end generation rate 92.5 tok/s
Geodessical google_gemma-4-E2B-it-Q4_0.gguf Decode-only rate 107.7 tok/s
Ollama gemma3:4b Eval rate (eval_count / eval_duration) 75.36 tok/s
Ollama gemma4:latest Eval rate (eval_count / eval_duration) 30.21 tok/s

Direct Comparison (same machine, same prompt length)

  • Geodessical end-to-end (92.5 tok/s) vs Ollama gemma3:4b (75.36 tok/s): +22.7%
  • Geodessical end-to-end (92.5 tok/s) vs Ollama gemma4:latest (30.21 tok/s): +206.2%

Repro Commands

Geodessical:

.\build_host\geodessical.exe "C:\Users\legom\TensorOS\models\google_gemma-4-E2B-it-Q4_0.gguf" -p "Write a 500-word explanation of how compilers optimize loops, in plain English." -n 256

Ollama (gemma3:4b):

$body = @{ model = 'gemma3:4b'; prompt = 'Write a 500-word explanation of how compilers optimize loops, in plain English.'; stream = $false; options = @{ num_predict = 256; temperature = 0.7 } } | ConvertTo-Json -Depth 6
$r = Invoke-RestMethod -Uri 'http://localhost:11434/api/generate' -Method Post -ContentType 'application/json' -Body $body
[math]::Round(($r.eval_count / ($r.eval_duration / 1e9)), 2)

Ollama (gemma4:latest):

$body = @{ model = 'gemma4:latest'; prompt = 'Write a 500-word explanation of how compilers optimize loops, in plain English.'; stream = $false; options = @{ num_predict = 256; temperature = 0.7 } } | ConvertTo-Json -Depth 6
$r = Invoke-RestMethod -Uri 'http://localhost:11434/api/generate' -Method Post -ContentType 'application/json' -Body $body
[math]::Round(($r.eval_count / ($r.eval_duration / 1e9)), 2)

Notes:

  • This is a practical runtime comparison, not a strict model-equivalence benchmark.
  • Geodessical and Ollama model packages are not byte-identical here, so use these results as operational guidance, not a canonical leaderboard.

Demo: Hosted Inference

$ ./geodessical phi3.5-mini-q4_0.gguf -p "What is an operating system?"

  Geodessical v0.4.0 "Axon"
  High-Performance AI Inference Runtime

[CPU] SSE2=1 AVX2=1 FMA=1 AVX512=0
[SMP] 8 CPUs online (7 workers + BSP)
[GD] Loading model: phi3.5-mini-q4_0.gguf
[GD] Mapped 2081 MB
[LLM] Model: Phi 3.5 Mini Instruct (phi3)
[LLM] 32 layers, 3072-dim, 32064 vocab, 32 heads
[GD] Model loaded in 1240 ms
[GD] Prompt: "What is an operating system?"

An operating system (OS) is a complex piece of software that manages...

Building

Prerequisites

Tool Purpose Install
zig (0.15+) C compiler ziglang.org/download

Host Mode (Windows/Linux)

# Build the hosted runtime
.\build_host.ps1

# Run with a GGUF model
.\build_host\geodessical.exe phi3.5.gguf -p "Hello world"

# Interactive chat mode
.\build_host\geodessical.exe phi3.5.gguf -i

Or with CMake (if GCC/Clang available):

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
./geodessical phi3.5.gguf -i

Bare-Metal Mode (QEMU)

# Build the bare-metal kernel + run in QEMU
.\build.ps1 -Run

# QEMU flags: -machine q35,accel=whpx -cpu EPYC-v4 -smp 4 -m 8G
#             -drive file=phi3.5.gguf,format=raw,if=virtio

Usage

Geodessical <model.gguf> [options]

Options:
  -p, --prompt <text>    Prompt text (default: interactive)
  -n, --tokens <num>     Max tokens to generate (default: 128)
  -t, --threads <num>    Thread count (default: all CPUs)
  --temp <float>         Temperature (default: 0.7)
  --top-k <int>          Top-K sampling (default: 40)
  --top-p <float>        Nucleus sampling (default: 0.9)
  -i, --interactive      Interactive chat mode
  -h, --help             Show this help

Architecture

Geodessical operates in two modes:

Host Mode (new)

┌─────────────────────────────────────────────────┐
│  geodessical.exe / Geodessical                  │
│  CLI: model load, prompt, interactive chat      │
├─────────────────────────────────────────────────┤
│  HAL (Hardware Abstraction Layer)               │
│  ┌───────────┬───────────┬──────────────────┐   │
│  │ Memory    │ Threading │ CPU Detection    │   │
│  │ malloc    │ Win32/    │ CPUID: SSE2,    │   │
│  │ aligned   │ pthreads  │ AVX2, FMA,      │   │
│  │ mmap      │ workers   │ AVX-512         │   │
│  └───────────┴───────────┴──────────────────┘   │
├─────────────────────────────────────────────────┤
│  Inference Engine (shared with bare-metal)      │
│  ┌──────┬─────────┬──────┬──────┬───────────┐   │
│  │ GGUF │ BPE     │ JIT  │ SMP  │ Forward   │   │
│  │parse │tokenize │ x86  │GEMV  │ pass      │   │
│  └──────┴─────────┴──────┴──────┴───────────┘   │
└─────────────────────────────────────────────────┘

Bare-Metal Mode (original TensorOS)

The full TensorOS kernel boots via Multiboot1, runs on x86_64/ARM64, and includes the AI shell, tensor scheduler, native git, GPU drivers, and everything documented in the TensorOS README.


Project Structure

Geodessical/
├── host/                      # Host-mode runtime (NEW)
│   ├── hal.h                  # Hardware Abstraction Layer header
│   ├── hal.c                  # Cross-platform HAL implementation
│   ├── main.c                 # CLI entry point
│   └── shims/                 # Include shims (kernel→HAL redirect)
│       └── kernel/...         # Shim headers for all kernel includes
├── runtime/
│   ├── nn/
│   │   ├── llm.c             # Full LLM inference engine
│   │   ├── llm.h             # Model types and API
│   │   └── gguf.c            # GGUF format parser
│   └── jit/
│       ├── x86_jit.c         # x86_64 JIT code emitter
│       └── llm_jit.c         # JIT forward kernels
├── kernel/                    # Bare-metal kernel (TensorOS heritage)
├── boot/                      # Bootloader (Multiboot1, ARM64)
├── build_host.ps1             # Host-mode build script (Zig CC)
├── build.ps1                  # Bare-metal build script
└── CMakeLists.txt             # CMake build (GCC/Clang)

How It Works

  1. Model Loading: Memory-maps the GGUF file (no copy), parses metadata, maps tensor pointers directly into the file.

  2. Tokenization: BPE tokenizer built from GGUF vocabulary with an O(1) hash table lookup and merge-based encoding.

  3. Forward Pass: Full transformer forward pass with RMSNorm → QKV projection → RoPE → GQA attention → SwiGLU FFN → LM head.

  4. JIT Compilation: On first inference, six x86_64 SIMD kernels are JIT-compiled (vadd, dot, axpy, fused_silu_mul, rope, rmsnorm) — eliminating per-element function call overhead.

  5. SMP Dispatch: Matrix-vector multiplies are partitioned across all CPU cores via the HAL's thread pool.

  6. Sampling: Temperature-scaled softmax with top-k/top-p nucleus sampling and optional greedy decoding.


Supported Models

Current GGUF coverage in this runtime includes:

Model Architecture Tested
Gemma 4 E2B It gemma4
Phi-3.5 Mini Instruct phi3
Qwen2.5 qwen2
LLaMA 3 llama
Gemma 2 gemma
SmolLM 2 llama
Mistral llama
Phi-2 phi2

Quantization: Q4_0, Q8_0, F16, F32


Origin

Geodessical evolved from TensorOS, a bare-metal AI operating system. The core inference engine, GGUF parser, BPE tokenizer, JIT compiler, and SMP parallel GEMV are shared between both projects. Geodessical adds the HAL layer to run the same inference code as a native application on Windows and Linux.


License

MIT

About

High-performance AI inference runtime. Evolved from TensorOS — runs on Windows, Linux, and bare metal with CUDA/Vulkan GPU acceleration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors