What is a Sparse Distributed Representation (SDR)?

Before we get started, I would like to pay my respect and gratitude to Jeff Hawkins https://en.wikipedia.org/wiki/Jeff_Hawkins and his pioneering work in this field.

Sparse Distributed Representations (SDRs) are a data format based on the theoretical operations of the mammalian neocortex. Unlike the dense, continuous vector embeddings used in standard Deep Learning (e.g., Word2Vec, BERT), SDRs use massive, highly sparse binary vectors to encode semantic meaning.

1. The Representation

An SDR is a long binary vector (typically $N \ge 2048$ bits) where only a tiny percentage of bits (typically $\sim2%$) are active ($1$) at any given time.

Dense Embedding (Standard): [0.23, -0.91, 0.55, ...] (Continuous, Compact)
SDR (Neuromorphic): [0, 0, 1, 0, 0, 0, 1, 0, ...] (Binary, Vast, Sparse)

Why "Distributed"?

The semantic meaning is distributed across the pattern of active bits.

No single bit represents "Cat".
The set of bits ${5, 120, 900, ...}$ represents "Cat".
If a few bits are flipped due to noise, the semantic meaning remains intact because the overall pattern is still recognizable.

2. Mathematical Superpowers

SDRs possess unique mathematical properties that make them ideal for robust AI architectures:

A. Superposition (The "Union" Property)

You can bundle multiple concepts into a single vector using a bitwise OR operation. Unlike dense vectors (where averaging mixes concepts into a blur), SDR unions retain the discrete identity of every component. $$\text{SDR}(\text{Cat}) \cup \text{SDR}(\text{Sat}) = \text{SDR}(\text{Cat, Sat})$$

Capacity: Because the space is so vast ($2^{2048}$), millions of unique patterns can coexist without colliding.
Retrieval: You can mathematically query the union: "Is 'Cat' inside this bundle?" by checking if the 'Cat' bits are active.

B. High-Noise Robustness

SDRs are inherently fault-tolerant.

Signal Loss: You can turn off 50% of the active bits, and the system can still identify the original concept (the "Attractor Basin").
Noise Injection: Random noise bits have little effect because they are unlikely to form a meaningful competing pattern by chance.

3. Comparison: Dense vs. SDR

Feature	Dense Embeddings (Standard Transformers)	Sparse Distributed Representations (SDR)
Data Type	Float32 / Float16	Binary (0/1)
Dimensions	Low (e.g., 768, 4096)	High (e.g., 2048, 16384)
Activity	100% Dense	~2% Sparse
Combination	Vector Addition (Blurs meaning)	Boolean Union (Preserves distinct items)
Noise Tolerance	Low (Values change significantly)	Extremely High
Biologically Plausible?	No	Yes

4. Relevance to this Project

In this repository, we replace the standard dense embedding layer of an LLM with an SDR Retina. This allows the Transformer to:

Compute Efficiently: Processing sparse binaries is computationally cheaper than dense floats.
Compress Context: Utilizing the Superposition property to pack multiple tokens into a single input step (e.g., 4x Context Compression).
Resist Noise: Maintaining high accuracy even when inputs are heavily corrupted.

SDR-Transformer: The "Brain in a Jar" Proof of Concept

Replacing Dense Embeddings with Sparse Distributed Representations (SDRs) in Large Language Models.

1. Overview

This project demonstrates that Sparse Distributed Representations (SDRs)—the data format used by the biological neocortex—can successfully replace dense vector embeddings in modern Transformers without sacrificing semantic capability.

By transplanting a generic Qwen 0.5B model with a new "SDR Retina" (Input Layer), we achieved a model that is:

Semantically Competent: Capable of general chat and instruction following.
Mathematically Robust: Solves arithmetic and logic puzzles (via GSM8k training).
Noise Resilient: Maintains accuracy even with 50% input signal corruption.
Compression Verified: We have experimentally verified the 4x Lossless Context Compression capability of SDRs (though the current model checkpoint is trained on standard sequential inputs).

2. The Architecture

Standard Transformers use dense lookup tables (Embeddings) to represent tokens. This project replaces that layer with a mathematical projection:

Input: Token Indices are converted into Sparse Binary Vectors (Length 2048, ~2% sparsity).
The Retina: A fixed, random linear projection layer maps these sparse binary patterns into the model's dense hidden space ($d=896$).
The Body: A standard Qwen 2.5 0.5B Transformer (pretrained weights preserved).

Flow: Token ID $\to$ SDR (2048 bits) $\to$ Linear Projection $\to$ Transformer Block

3. Key Results

A. Robustness (The "France" Test)

We injected random noise into the input SDRs (flipping active bits to 0, inactive bits to 1).

0% Noise: Perfect accuracy.
20% Noise: Perfect semantic accuracy (Model stays stable).
50% Noise: Model still retrieves high-probability facts (e.g., "Capital of France is Paris") despite half the input data being destroyed.

B. Logic & Math (The "25+25" Fix)

Initial versions suffered from "Number Blindness" (seeing "25" as a fuzzy mix of "2" and "5").

Phase 3 Training (GSM8k): Taught the Retina to distinguish multi-digit numbers.
Result: The model now correctly solves 25 + 25 = 50 and performs multi-step Chain-of-Thought reasoning.

C. Superposition (The "Unpacking" Verification)

We utilized the Union Property of SDRs to test packing multiple tokens into a single input vector using rotation-based positional encoding.

Verification: Successfully packed 4 tokens ("The quick brown fox") into 1 SDR.
Retrieval: Achieved 100% Lossless Retrieval of the original sequence from the single packed vector using mathematical decoding.
Status: This property is verified in our experiment scripts. A "Turbo" model trained to read these packed inputs directly is planned for V2.

4. Installation

# 1. Clone the repository
git clone [https://github.com/your-repo/SDR_Transformer_PoC.git](https://github.com/your-repo/SDR_Transformer_PoC.git)
cd SDR_Transformer_PoC

# 2. Install dependencies
pip install torch transformers datasets accelerate

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
core		core
expriments		expriments
sdr_qwen_phase1_retina		sdr_qwen_phase1_retina
sdr_qwen_phase2_chat		sdr_qwen_phase2_chat
sdr_qwen_phase3_math		sdr_qwen_phase3_math
training		training
LICENSE		LICENSE
README.md		README.md
chat_with_sdr_phase3.py		chat_with_sdr_phase3.py
pull_qwen_tokenizer_v1.py		pull_qwen_tokenizer_v1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is a Sparse Distributed Representation (SDR)?

1. The Representation

Why "Distributed"?

2. Mathematical Superpowers

A. Superposition (The "Union" Property)

B. High-Noise Robustness

3. Comparison: Dense vs. SDR

4. Relevance to this Project

SDR-Transformer: The "Brain in a Jar" Proof of Concept

1. Overview

2. The Architecture

3. Key Results

A. Robustness (The "France" Test)

B. Logic & Math (The "25+25" Fix)

C. Superposition (The "Unpacking" Verification)

4. Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is a Sparse Distributed Representation (SDR)?

1. The Representation

Why "Distributed"?

2. Mathematical Superpowers

A. Superposition (The "Union" Property)

B. High-Noise Robustness

3. Comparison: Dense vs. SDR

4. Relevance to this Project

SDR-Transformer: The "Brain in a Jar" Proof of Concept

1. Overview

2. The Architecture

3. Key Results

A. Robustness (The "France" Test)

B. Logic & Math (The "25+25" Fix)

C. Superposition (The "Unpacking" Verification)

4. Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages