Skip to content

aamir-gmail/SDR_Transformer_PoC

Repository files navigation

What is a Sparse Distributed Representation (SDR)?

Before we get started, I would like to pay my respect and gratitude to Jeff Hawkins https://en.wikipedia.org/wiki/Jeff_Hawkins and his pioneering work in this field.

Sparse Distributed Representations (SDRs) are a data format based on the theoretical operations of the mammalian neocortex. Unlike the dense, continuous vector embeddings used in standard Deep Learning (e.g., Word2Vec, BERT), SDRs use massive, highly sparse binary vectors to encode semantic meaning.

1. The Representation

An SDR is a long binary vector (typically $N \ge 2048$ bits) where only a tiny percentage of bits (typically $\sim2%$) are active ($1$) at any given time.

  • Dense Embedding (Standard): [0.23, -0.91, 0.55, ...] (Continuous, Compact)
  • SDR (Neuromorphic): [0, 0, 1, 0, 0, 0, 1, 0, ...] (Binary, Vast, Sparse)

Why "Distributed"?

The semantic meaning is distributed across the pattern of active bits.

  • No single bit represents "Cat".
  • The set of bits ${5, 120, 900, ...}$ represents "Cat".
  • If a few bits are flipped due to noise, the semantic meaning remains intact because the overall pattern is still recognizable.

2. Mathematical Superpowers

SDRs possess unique mathematical properties that make them ideal for robust AI architectures:

A. Superposition (The "Union" Property)

You can bundle multiple concepts into a single vector using a bitwise OR operation. Unlike dense vectors (where averaging mixes concepts into a blur), SDR unions retain the discrete identity of every component. $$\text{SDR}(\text{Cat}) \cup \text{SDR}(\text{Sat}) = \text{SDR}(\text{Cat, Sat})$$

  • Capacity: Because the space is so vast ($2^{2048}$), millions of unique patterns can coexist without colliding.
  • Retrieval: You can mathematically query the union: "Is 'Cat' inside this bundle?" by checking if the 'Cat' bits are active.

B. High-Noise Robustness

SDRs are inherently fault-tolerant.

  • Signal Loss: You can turn off 50% of the active bits, and the system can still identify the original concept (the "Attractor Basin").
  • Noise Injection: Random noise bits have little effect because they are unlikely to form a meaningful competing pattern by chance.

3. Comparison: Dense vs. SDR

Feature Dense Embeddings (Standard Transformers) Sparse Distributed Representations (SDR)
Data Type Float32 / Float16 Binary (0/1)
Dimensions Low (e.g., 768, 4096) High (e.g., 2048, 16384)
Activity 100% Dense ~2% Sparse
Combination Vector Addition (Blurs meaning) Boolean Union (Preserves distinct items)
Noise Tolerance Low (Values change significantly) Extremely High
Biologically Plausible? No Yes

4. Relevance to this Project

In this repository, we replace the standard dense embedding layer of an LLM with an SDR Retina. This allows the Transformer to:

  1. Compute Efficiently: Processing sparse binaries is computationally cheaper than dense floats.
  2. Compress Context: Utilizing the Superposition property to pack multiple tokens into a single input step (e.g., 4x Context Compression).
  3. Resist Noise: Maintaining high accuracy even when inputs are heavily corrupted.

SDR-Transformer: The "Brain in a Jar" Proof of Concept

Replacing Dense Embeddings with Sparse Distributed Representations (SDRs) in Large Language Models.

1. Overview

This project demonstrates that Sparse Distributed Representations (SDRs)—the data format used by the biological neocortex—can successfully replace dense vector embeddings in modern Transformers without sacrificing semantic capability.

By transplanting a generic Qwen 0.5B model with a new "SDR Retina" (Input Layer), we achieved a model that is:

  • Semantically Competent: Capable of general chat and instruction following.
  • Mathematically Robust: Solves arithmetic and logic puzzles (via GSM8k training).
  • Noise Resilient: Maintains accuracy even with 50% input signal corruption.
  • Compression Verified: We have experimentally verified the 4x Lossless Context Compression capability of SDRs (though the current model checkpoint is trained on standard sequential inputs).

2. The Architecture

Standard Transformers use dense lookup tables (Embeddings) to represent tokens. This project replaces that layer with a mathematical projection:

  1. Input: Token Indices are converted into Sparse Binary Vectors (Length 2048, ~2% sparsity).
  2. The Retina: A fixed, random linear projection layer maps these sparse binary patterns into the model's dense hidden space ($d=896$).
  3. The Body: A standard Qwen 2.5 0.5B Transformer (pretrained weights preserved).

Flow: Token ID $\to$ SDR (2048 bits) $\to$ Linear Projection $\to$ Transformer Block

3. Key Results

A. Robustness (The "France" Test)

We injected random noise into the input SDRs (flipping active bits to 0, inactive bits to 1).

  • 0% Noise: Perfect accuracy.
  • 20% Noise: Perfect semantic accuracy (Model stays stable).
  • 50% Noise: Model still retrieves high-probability facts (e.g., "Capital of France is Paris") despite half the input data being destroyed.

B. Logic & Math (The "25+25" Fix)

Initial versions suffered from "Number Blindness" (seeing "25" as a fuzzy mix of "2" and "5").

  • Phase 3 Training (GSM8k): Taught the Retina to distinguish multi-digit numbers.
  • Result: The model now correctly solves 25 + 25 = 50 and performs multi-step Chain-of-Thought reasoning.

C. Superposition (The "Unpacking" Verification)

We utilized the Union Property of SDRs to test packing multiple tokens into a single input vector using rotation-based positional encoding.

  • Verification: Successfully packed 4 tokens ("The quick brown fox") into 1 SDR.
  • Retrieval: Achieved 100% Lossless Retrieval of the original sequence from the single packed vector using mathematical decoding.
  • Status: This property is verified in our experiment scripts. A "Turbo" model trained to read these packed inputs directly is planned for V2.

4. Installation

# 1. Clone the repository
git clone [https://github.com/your-repo/SDR_Transformer_PoC.git](https://github.com/your-repo/SDR_Transformer_PoC.git)
cd SDR_Transformer_PoC

# 2. Install dependencies
pip install torch transformers datasets accelerate

About

SDR-Transformer: The "Brain in a Jar" Proof of Concept Replacing Dense Embeddings with Sparse Distributed Representations (SDRs) in Large Language Models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages