Skip to content

sbuysse/cortex

Repository files navigation

Cortex

A cognitive architecture in Rust combining spiking neural networks with foundation model encoders. 2 million ALIF neurons, 2 billion synaptic connections, 10 brain regions, STDP-based learning.

Overview

Cortex is a research platform for studying how spiking neural dynamics can form associative knowledge from multimodal input. It encodes perception (DINOv2, CLIP, Whisper, MiniLM) into spike trains, imprints knowledge into synaptic weights, and recalls through spreading activation across 2 billion connections — the same mechanism the brain uses.

The system watches YouTube videos, extracts knowledge triples via LLM, imprints them into actual synaptic connections between neuron assemblies, and discovers emergent cross-domain associations through neural propagation. Knowledge lives in the weights, not in a database.

Neuroscience Alignment

Cortex is grounded in real neuroscience. See docs/NEUROSCIENCE.md for a point-by-point comparison with brain science. Key alignments:

Mechanism Biology Cortex
Neurons Leaky integrate-and-fire with spike-frequency adaptation ALIF neurons (2M), SoA layout
Learning Three-factor STDP (pre/post timing × neuromodulator) Eligibility traces × dopamine/ACh gating
Stability Homeostatic multiplicative synaptic scaling Multiplicative drive scaling (preserves weight ratios)
Recall Spreading activation through synaptic connections Spike propagation through 2B imprinted synapses
Sequences Theta phase precession (temporal offset → STDP) STDP-timed chain imprinting (5-step offset)
Modulation DA (reward), ACh (attention), NE (arousal), 5-HT (mood) Four scalar modulators controlling learning and recall modes
Concepts Cell assemblies (~100 co-firing neurons) 100-neuron dedicated assemblies per concept

Architecture

                        ┌─────────────────────┐
                        │    brain-server      │
                        │   (axum, 60+ API)    │
                        └──────────┬──────────┘
                                   │
          ┌────────────────────────┼────────────────────────┐
          │                        │                        │
┌─────────┴─────────┐  ┌─────────┴─────────┐  ┌──────────┴─────────┐
│  brain-cognition   │  │  brain-inference   │  │   brain-spiking    │
│                    │  │                    │  │                    │
│ Working memory     │  │ DINOv2  (384d)     │  │ 10 brain regions   │
│ Hopfield memory    │  │ CLIP    (512d)     │  │ 2M ALIF neurons    │
│ Knowledge graph    │  │ Whisper (512d)     │  │ 2B CSR synapses    │
│ Personal memory    │  │ MiniLM  (384d)     │  │ Three-factor STDP  │
│ Companion/emotion  │  │ World model        │  │ 4 neuromodulators  │
│ Autonomy loop      │  │ Mel spectrogram    │  │ Cell assemblies    │
│ Sleep consolidation│  │ VAD, faces         │  │ Triple extraction  │
└────────────────────┘  └────────────────────┘  │ Chain recall       │
                                                 │ Sleep/pruning      │
          ┌─────────────────────┐                └────────────────────┘
          │    brain-core       │
          │ Hebbian networks    │
          │ Sparse projections  │
          └─────────────────────┘

Brain Regions

Region Neurons Role
Visual cortex 200K Receives DINOv2/CLIP/MiniLM embeddings via latency coding
Auditory cortex 200K Receives Whisper audio embeddings
Association cortex 500K Cross-modal binding, cell assemblies for concepts
Predictive cortex 200K Top-down prediction, bottom-up error signals
Hippocampus 300K Fast pattern storage (DG/CA3/CA1 subfields)
Prefrontal cortex 200K Working memory attractors (NMDA-like slow decay)
Amygdala 100K Emotional valence assignment
Motor cortex 100K Action/speech output
Brainstem 50K Neuromodulator source (DA, ACh, NE, 5-HT)
Cerebellum 150K Timing and error correction

Learning Pipeline

YouTube video
  → yt-dlp auto-subtitles
  → Sentence chunking (~200 chars, filler filtering)
  → LLM-powered triple extraction (Ollama, batched):
      "From these sentences about X, extract subject|verb|object triples"
      e.g., TurboQuant|compresses|KV cache
  → Batch learning: all triples encoded in one tick (~0ms)
  → Concept association matrix: S→R, R→O, S→O edges strengthened
  → No text stored — association weights ARE the memory

Recall

Query → Fuzzy concept matching → Find matching cell assemblies
  → BFS through concept association graph (up to 6 hops)
  → Follow strongest weighted edges, filter noise concepts
  → Chain of associated concepts returned with strength scores
  → Injected into LLM system prompt as learned knowledge

Experiment: Learning TurboQuant from a YouTube Video

To validate the architecture, we taught Cortex about TurboQuant — a quantization method published after the LLM's training cutoff. The video title is clickbait ("Google's New AI Just Broke My Brain") and never mentions "TurboQuant" — the LLM alone cannot answer questions about it.

Step 1: Ask the raw LLM (no Cortex)

Q: "tell me about TurboQuant"
A: "I don't have specific learned knowledge about TurboQuant..."

Step 2: Paste the YouTube URL in the Cortex UI

Input: https://www.youtube.com/watch?v=7YVrb3-ABYE

→ LLM identifies topic from transcript: "turbo quant"
→ 53 triples extracted (turbo quant|compresses|kv cache, turbo quant|uses|random rotations, ...)
→ Triples stored in spiking brain (synaptic imprinting) + knowledge graph (SQLite)
→ Concepts seeded: turbo quant, lora, kv cache, random rotations, memory usage, ...

Step 3: Ask Cortex

Q: "tell me about TurboQuant"

Grounding (learned concepts matched):
  - turboquant (0.58)
  - random rotations (0.48)
  - chopping off end of numbers (0.45)

KG edges retrieved:
  - turbo quant → compresses → kv cache
  - turbo quant → uses → random rotations
  - turbo quant → affects → memory usage

A: "TurboQuant is a system that impacts large semiconductor companies'
    stock prices by efficiently compressing kv cache and reducing memory
    usage through the use of random rotations. It's particularly relevant
    in the context of large language models where it helps optimize
    performance, though its implementation has also been subject to
    controversy."

Every fact in the response — KV cache compression, random rotations, memory reduction, controversy — comes from the video transcript, not the LLM's training data.

Experiment 1b: Learning LoRA from a YouTube Video

Same pipeline, different topic. Fresh server restart, no prior knowledge of this video.

Before learning:

Q: "what is LoRA?"
A: "I don't have learned knowledge specifically about LoRA..."
   (learned concepts: [])

Paste URL: https://www.youtube.com/watch?v=PXWYUTMt-AU

→ Topic: "lora" (extracted from transcript by LLM)
→ 60 triples extracted
→ Concepts: lora, low-rank adaptation, base model weights, parameter numbers, ...

After learning:

Q: "what is LoRA?"

Grounding:
  - lora (0.60)
  - LoRA: Low-Rank Adaptation of Large Language Models (0.40)
  - low-rank adaptation (0.39)
  - low-rank decomposition (0.38)
  - read-only to base model weights (0.36)

A: "LoRA stands for Low-Rank Adaptation, a method used in machine
    learning to efficiently adapt large models to new tasks without
    the need for full retraining. It achieves this by updating only a
    small subset of parameters using low-rank matrices, which helps in
    reducing both computational expense and storage requirements."

Experiment 2: Cross-Domain Emergent Discovery

After teaching Cortex 24 topics (TurboQuant, FlashAttention, transformers, LoRA, GGUF, spiking networks, diffusion models, tokenization, and more), we asked a question that spans multiple domains.

Query: "How does TurboQuant work?"

Brain associations (dual-pathway recall with confidence tags):

[explicit] kv cache (strength: 200)           — from BFS
[explicit] short-term memory of models (160)   — from BFS
[explicit] formal mathematical proof (160)     — from BFS
[emergent] sparse (strength: 70)               — discovered by spiking propagation
[emergent] word into vector (strength: 70)     — discovered by spiking propagation
[emergent] similar words close to each other (70) — discovered by spiking propagation

The [emergent] associations were NOT learned from TurboQuant's video — they were discovered by the 2B-synapse spiking network finding lateral pathways to concepts from other topics (tokenization, embeddings). The spiking brain connected "quantization" to "sparsity" and "vector representations" through neural propagation, not text matching.

How It Works

  1. BFS recall (0ms): follows explicit learned edges in the HashMap association graph
  2. Spiking recall (0.1s): fires seed concepts into 500K association cortex neurons, propagates through imprinted + random synapses for 30 steps
  3. Merge: concepts found by both = [confirmed], BFS only = [explicit], spiking only = [emergent]
  4. Neuromodulator control: single-topic queries use focused mode (high acetylcholine), multi-topic queries use broad mode (high norepinephrine)

Performance

Metric Value
Topics learned 24 (from YouTube videos)
Concepts 816
Associations 1,195
Persisted triples 423 (survives restarts)
Triple extraction LLM-powered (Ollama), ~12 triples per video in 9.4s
Learning Batch: 12 triples in 0.000s + 803 synapses imprinted
BFS recall 0.000s (instant)
Spiking recall 0.1s (30 steps through association cortex)
Brain scale 2M neurons, 2B synapses, 10 regions

UI — Immersive 3D Brain Explorer

Open https://your-server:8443/ to access the brain explorer.

  • Full-screen 3D brain with 10 anatomically positioned regions that glow based on spike activity (Three.js)
  • Knowledge graph visible when zoomed in — 1000+ concept nodes colored by topic, connected by learned associations
  • Ask questions via the unified input bar — the brain animates during recall, response appears as a floating card with confidence tags
  • Learn from YouTube — paste a URL, the brain learns in real-time with progress animation
  • Browse knowledge — slide-out panels for topics, brain regions, and system stats
  • Confidence visualization[confirmed] green, [explicit] blue, [emergent] purple, [predicted] orange

Built with Three.js, vanilla JS, and Tailwind CSS. Single HTML page, no framework.

Installation

Prerequisites

  • Rust (edition 2024)
  • libtorch (PyTorch C++ library)
  • Ollama (for LLM dialogue — optional)
  • yt-dlp + ffmpeg (for video learning — optional)

Build

git clone https://github.com/sbuysse/cortex.git
cd cortex/rust

# Point to your libtorch installation
export LIBTORCH=/path/to/libtorch        # e.g., /usr/local/lib64/python3.14/site-packages/torch
export LIBTORCH_USE_PYTORCH=1
export LD_LIBRARY_PATH=$LIBTORCH/lib:$LD_LIBRARY_PATH

cargo build --release -p brain-server

Run

# Minimal (no spiking brain, no models)
BRAIN_PROJECT_ROOT=/path/to/cortex ./target/release/brain-server

# With spiking brain (scale: 0.01=tiny test, 0.1=development, 1.0=full 2M neurons)
BRAIN_PROJECT_ROOT=/path/to/cortex SPIKING_SCALE=0.1 ./target/release/brain-server

# Disable cortex experiment runner (saves CPU for spiking brain + Ollama)
BRAIN_CORTEX_DISABLE=1 SPIKING_SCALE=0.1 BRAIN_PROJECT_ROOT=/path/to/cortex ./target/release/brain-server

The server starts on https://localhost:443 (TLS with self-signed cert).

Optional: Ollama for dialogue

# Install Ollama (https://ollama.ai)
ollama pull qwen2.5:1.5b

# Keep model loaded permanently (avoids 25s cold-start)
export OLLAMA_KEEP_ALIVE=-1

Optional: Video learning

# Install yt-dlp and ffmpeg
pip install yt-dlp
# ffmpeg via your package manager

# Teach Cortex from a YouTube video
curl -sk -X POST https://localhost/api/brain/learn/academic \
  -H 'Content-Type: application/json' \
  -d '{"query": "https://www.youtube.com/watch?v=VIDEO_ID", "topic": "topic name"}'

API Reference

See docs/API.md for the full 60+ endpoint reference.

Key endpoints:

Endpoint Method Description
/api/brain/learn/academic POST Learn from YouTube video {query, topic}
/api/brain/dialogue/grounded POST Conversation with brain associations {message}
/api/brain/spiking/status GET Neuron counts, spike rates, neuromodulator levels
/api/brain/watch POST Process image through visual cortex
/api/listen/process POST Process audio through auditory cortex
/api/brain/dream POST Generate imagination chain
/api/companion/greeting GET Time-of-day greeting with personal context
/api/companion/safety GET Caregiver safety alerts

Configuration

Environment Variable Default Description
BRAIN_PROJECT_ROOT current dir Path to project root (templates, outputs)
SPIKING_SCALE 0 (disabled) Neuron count multiplier (0.1 = 200K, 1.0 = 2M)
BRAIN_CORTEX_DISABLE not set Set to 1 to disable experiment runner
COMPANION_MODEL qwen2.5:1.5b Ollama model for dialogue
OLLAMA_MODEL qwen2.5:1.5b Ollama model for triple extraction
OLLAMA_URL http://localhost:11434 Ollama API endpoint
BRAIN_BIND_ADDR 0.0.0.0:443 Server bind address

Project Structure

rust/
  crates/
    brain-spiking/     # Spiking neural network engine
      src/
        neuron.rs       # ALIF neurons, SoA layout
        synapse.rs      # COO builder → CSR storage, synaptic scaling
        region.rs       # Brain region (neurons + synapses + STDP)
        network.rs      # Multi-region orchestrator
        concepts.rs     # Cell assemblies, triple extraction
        knowledge.rs    # Concept association matrix, BFS chain recall
        plasticity.rs   # Three-factor STDP, TACOS dual-weight
        neuromodulation.rs  # DA, ACh, NE, 5-HT
        sleep.rs        # NREM replay + REM noise + structural pruning
        spike_encoder.rs    # Latency coding (embedding → spikes)
        spike_decoder.rs    # Rate decoding (spikes → embedding)
    brain-server/      # HTTP server (axum)
    brain-cognition/   # Cognitive systems
    brain-inference/   # TorchScript model loading
    brain-core/        # Hebbian networks
    brain-db/          # SQLite persistence
    brain-experiment/  # Self-improving mutation loop
scripts/               # Python training, data download, model export
templates/             # Web UI (8 HTML pages)
docs/                  # API reference, engineering lessons

Contributing

See CONTRIBUTING.md.

License

PolyForm Noncommercial 1.0.0 — free for personal use, research, education, and non-profit organizations. Commercial use requires a separate license from Akretio.

About

Cognitive architecture in Rust — 2M spiking neurons, 2B synapses, 10 brain regions, STDP learning, knowledge recall through learned synaptic pathways

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors