Sovereign AI agent runtime for every device.
Quick Start β’ Features β’ Architecture β’ Hardware β’ Build β’ Docs
NeuronOS is a self-contained AI agent engine written in pure C11. It runs complete autonomous agents β with reasoning, memory, tool use, and inter-agent communication β on any device, from a Raspberry Pi to a cloud server, with zero runtime dependencies and zero cloud requirements.
Built on BitNet b1.58 ternary models, NeuronOS delivers useful AI agents on hardware as modest as 1.5 GB of RAM, entirely offline.
$ curl -fsSL https://raw.githubusercontent.com/Neuron-OS/NeuronOS/main/install.sh | sh
$ neuronos
> What files are in my project?
[tool: list_dir] Scanning ./...
Found 12 files. Here's what I see:
src/main.c β Entry point
src/utils.c β Helper functions
Makefile β Build configuration
...
> Remember that the deadline for this project is March 15
[tool: memory_store] Saved to archival memory.
Noted. I'll remember the March 15 deadline.
Universal Install (Linux, macOS, Android, Windows via WSL):
curl -fsSL https://raw.githubusercontent.com/Neuron-OS/NeuronOS/main/install.sh | shThis single command will:
- Detect your OS (Debian, Fedora, Arch, macOS, Android/Termux).
- Install Dependencies (Vulkan SDK, CMake, Compilers) automatically.
- Build & Install
neuronosoptimized for your hardware. - Download the best 1.58-bit model for your RAM.
Manual Build:
git clone https://github.com/Neuron-OS/neuronos
cd neuronos
./install.sh --buildWeb/WASM Build:
./install.sh --wasm- ReAct reasoning loop β Think β Act β Observe cycles with transparent reasoning
- 12 built-in tools β Shell, file read/write, directory listing, file search, PDF reading, HTTP requests, calculator, time, and 3 memory tools
- 10,000+ external tools via MCP client integration
- 3-format GBNF grammar β Constrained generation for reliable tool calling
- Multi-turn conversations with persistent context
- Core Memory β Key-value blocks injected into every prompt (persona, instructions)
- Recall Memory β Full chat history per session, FTS5 full-text searchable
- Archival Memory β Permanent facts with unique keys, searchable, access-tracked
- Automatic context compaction at ~85% capacity with summarization
- MCP Server β Expose NeuronOS tools to any MCP-compatible client (JSON-RPC 2.0, STDIO)
- MCP Client β Connect to external MCP servers, auto-discover and use their tools (~1,370 lines of pure C)
- OpenAI-compatible HTTP API β
/v1/chat/completions,/v1/models, SSE streaming - A2A Protocol β Agent-to-agent communication (coming next β first C implementation worldwide)
- BitNet b1.58 ternary models β 2B params in 1.71 GiB, runs on 1.5 GB RAM
- 21 tokens/sec generation on a laptop CPU (i7-12650H, 4 threads)
- 95 tokens/sec prompt processing on the same hardware
- Multi-model support β BitNet 2B, Falcon3-7B/10B (1.58-bit), Qwen2.5-3B/14B (Q4_K_M)
- Automatic model selection based on detected hardware capabilities
- 5 ISA backends with automatic runtime detection:
hal_scalarβ Pure C fallback (works everywhere)hal_x86_avx2β Intel/AMD Haswell+ (2013+)hal_x86_avxvnniβ Intel Alder Lake+ (2021+)hal_arm_neonβ Apple Silicon, Raspberry Pi 4/5- CUDA build available for NVIDIA GPUs (Q4_K_M models)
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 7: Applications β
β CLI (8 modes) β’ HTTP Server β’ MCP Server β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 6: Agent β
β ReAct Loop β’ Tool Dispatch β’ Step Callbacks β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 5: Tools β
β Registry (12 built-in) β’ MCP Bridge β’ Sandboxβ
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 4: Grammar β
β GBNF Constrained Generation (3 formats) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: Inference β
β llama.cpp wrapper (BitNet I2_S kernels) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2.5: Memory β
β SQLite 3.47.2 + FTS5 (MemGPT 3-tier) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: HAL β
β Runtime ISA dispatch (scalar/AVX2/VNNI/NEON) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 1: Hardware β
β x86-64 β’ ARM64 β’ RISC-V β’ WASM (planned) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
~9,400 lines of C11 across 19 source files. No C++ in the public API.
| Platform | CPU (Avx2/ARM) | GPU (Vulkan) | NPU | Web (WASM) |
|---|---|---|---|---|
| Linux | β | β | π§ | β |
| macOS | β | β (MoltenVK) | π§ | β |
| Windows | β | β | π§ | β |
| Android | β | β | π§ | β |
| iOS | - | - | - | β (Safari) |
neuronos # Auto-detect model, launch agent
neuronos run "Summarize this" # Single prompt
neuronos agent # Explicit agent modeneuronos --mcp # Load tools from ~/.neuronos/mcp.jsonneuronos serve --port 8080 # Start API server
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"neuronos","messages":[{"role":"user","content":"Hello"}]}'neuronos mcp # JSON-RPC 2.0 over STDIOneuronos hwinfo # Show detected hardware + backends
neuronos scan # Scan for available models- C11 compiler (Clang 14+ or GCC 12+ recommended)
- CMake 3.20+
- ~2 GB disk space for build
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
cmake --build build -j$(nproc)./build/bin/test_hal && ./build/bin/test_engine && ./build/bin/test_memory
# Expected: 27/27 PASS| Option | Description | Default |
|---|---|---|
CMAKE_BUILD_TYPE |
Release / Debug | Release |
BITNET_X86_TL2 |
x86 TL2 kernel (experimental) | OFF |
CMAKE_EXPORT_COMPILE_COMMANDS |
Generate compile_commands.json | OFF |
neuronos/
βββ include/neuronos/
β βββ neuronos.h # Public API (694 lines, v0.9.1)
β βββ neuronos_hal.h # HAL API (331 lines)
βββ src/
β βββ hal/ # Hardware abstraction backends
β β βββ hal_registry.c # Backend registry + CPUID detection
β β βββ hal_scalar.c # Pure C fallback
β β βββ hal_x86_avx2.c # AVX2 backend
β β βββ hal_x86_avxvnni.c # AVX-VNNI backend
β β βββ hal_arm_neon.c # ARM NEON backend
β βββ engine/
β β βββ neuronos_engine.c # Inference engine (llama.cpp wrapper)
β β βββ neuronos_model_selector.c # HW detection + model scoring
β βββ memory/
β β βββ neuronos_memory.c # MemGPT 3-tier memory (SQLite+FTS5)
β βββ agent/
β β βββ neuronos_agent.c # ReAct agent loop + memory integration
β β βββ neuronos_tool_registry.c # Tool registry + 12 built-in tools
β βββ cli/
β β βββ neuronos_cli.c # CLI with 8 modes
β βββ interface/
β β βββ neuronos_server.c # HTTP server (OpenAI API + SSE)
β βββ mcp/
β βββ neuronos_mcp_server.c # MCP server (JSON-RPC STDIO)
β βββ neuronos_mcp_client.c # MCP client (~1370 lines)
βββ 3rdparty/
β βββ sqlite/ # SQLite 3.47.2 amalgamation
β βββ sqlite-vec/ # sqlite-vec v0.1.6 (prepared)
βββ tests/
β βββ test_hal.c # 4 HAL tests
β βββ test_engine.c # 11 engine + agent tests
β βββ test_memory.c # 12 memory tests
βββ grammars/
βββ tool_call.gbnf # Tool calling grammar
βββ json.gbnf # JSON output grammar
| Document | Description |
|---|---|
| ROADMAP.md | Strategic roadmap and execution plan |
| TRACKING.md | Iteration-by-iteration progress log |
| AGENTS.md | Instructions for AI coding agents |
| ARSENAL.md | Technology arsenal and market research |
- Not an inference speed benchmark. llama.cpp will always be faster. We optimize for agent utility.
- Not a cloud service. Everything runs locally. Your data never leaves your device.
- Not a Python framework. Pure C11, zero runtime dependencies. Compiles to a single binary.
- Not a replacement for GPT-5. Ternary models have limits. We bring intelligence where frontier models can't reach: offline, embedded, private, free.
We welcome contributions. Please read AGENTS.md for coding standards and architecture guidelines before submitting PRs.
All tests must pass before any commit:
./build/bin/test_hal && ./build/bin/test_engine && ./build/bin/test_memoryMIT License. See LICENSE for details.
SQLite is public domain. sqlite-vec is MIT/Apache-2.0.