Steerling

An interpretable causal diffusion language model.

Steerling-8B combines masked diffusion language modeling with concept decomposition, enabling:

Generation: Non-autoregressive text generation via confidence-based unmasking
Attribution: Decompose predictions into known concept contributions
Steering: Intervene on concept activations to control generation
Embeddings: Extract hidden, composed, known, or discovered representations

For more information, tutorials, and updates, visit guidelabs.ai. To learn more about the architecture behind Steerling, check out our blog posts on Scaling Interpretable Models with 8B Parameters and Causal Diffusion Language Models.

Quick Start

uv pip install steerling
source .venv/bin/activate

import torch
from transformers import AutoModel, AutoTokenizer
from steerling import SteerlingGenerator
from steerling.configs.generation import GenerationConfig

model = AutoModel.from_pretrained("guidelabs/steerling-8b", trust_remote_code=True, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("guidelabs/steerling-8b", trust_remote_code=True)
generator = SteerlingGenerator.from_model(model, tokenizer, device="cuda")

prompt = "The key to understanding neural networks is"
config = GenerationConfig(max_new_tokens=128, steps=128, temperature=0.4)
text = generator.generate(prompt, config)
print(text)

Requirements: Python >= 3.13, GPU with >= 18 GB VRAM (H100, A100, A6000, RTX 4090), CUDA 12.8

Model Details

Property	Value
Parameters	~8B
Architecture	CausalDiffusionLM + Interpretable Concept Head
Context Length	4096
Vocabulary	100,281 (cl100k_base + specials)
Known Concepts	33,732
Discovered Concepts	101,196
GQA	32 heads, 4 KV heads
Precision	bfloat16

Architecture

Steerling uses block-causal attention (bidirectional within 64-token blocks, causal across blocks) with masked diffusion training. At inference, tokens are generated by iteratively unmasking positions in order of model confidence. The interpretable concept heads decompose transformer hidden states h into:

h → known_features + unk_hat + epsilon = composed → lm_head → logits

known_features: Weighted sum of top-k learned concept embeddings
unk_hat: Residual features captured by a factorized discovered concept head
epsilon: Small correction term for reconstruction fidelity

Interpretability

Every prediction Steerling makes can be decomposed into three components: known concepts (human-interpretable features), discovered concepts (learned residual features), and epsilon (reconstruction correction). The plot below shows the fraction of each token's logit attributable to each component:

See logit_contribution.ipynb for per-token decomposition and chunk_level_concept_attribution.ipynb for chunk-level concept attribution.

Installation

# From PyPI
uv pip install steerling

# From source
git clone https://github.com/guidelabs/steerling.git
cd steerling
uv sync --extra dev    # full dev environment
source .venv/bin/activate

Note: PyTorch is installed with CUDA 12.8 support automatically via the PyTorch index configured in pyproject.toml. If you need a different CUDA version, install PyTorch manually before installing steerling.

Evaluation

We provide evaluation scripts based on lm-evaluation-harness.

# Run all benchmarks (HellaSwag, ARC-Challenge, WinoGrande, PIQA, MMLU, GSM8K)
bash scripts/eval_steerling_lm_eval.sh

# Specify a model path
MODEL_PATH=/path/to/local/model bash scripts/eval_steerling_lm_eval.sh

# Run specific tasks
TASKS="hellaswag arc_challenge" bash scripts/eval_steerling_lm_eval.sh

# Or use the Python CLI directly
python scripts/evaluate.py --model guidelabs/steerling-8b --tasks hellaswag arc_challenge

Notebooks

Notebook	Description
generation.ipynb	Text generation — block-by-block unmasking, special tokens, early stopping with `<\|endofchunk\|>`
logit_contribution.ipynb	Decompose each predicted token's logit into known concept, discovered concept, and residual contributions
chunk_level_concept_attribution.ipynb	Attribute generated text chunks to known concepts using normalized feature contributions

FAQ

Where can I read more about the architecture?
See our blog posts: Scaling Interpretable Models with 8B Parameters and Causal Diffusion Language Models. A detailed technical report is coming soon.
Is there an instruction-tuned model?
Stay tuned.
What dataset was this trained on?
An augmented version of the Nemotron-CC-HQ dataset for approximately 1.35 trillion tokens.
What is block-causal attention?
Standard causal attention only lets each token attend to previous tokens. Block-causal attention groups tokens into blocks of 64 and allows bidirectional attention within each block, while maintaining causal ordering across blocks. See Causal Diffusion Language Models for more details.
What are "known" and "discovered" concepts?
The model decomposes its internal representations into two parts:
- Known concepts (33,732): learned, supervised features corresponding to identifiable patterns a human can understand.
- Discovered concepts (101,196): capture the signal that known concepts don't explain.
- Together they reconstruct the full hidden state: hidden ≈ known_features + discovered_features + epsilon.
How do I find concept IDs for steering?
A full walkthrough of concept extraction and steering is coming in the next few weeks.
What GPU do I need?
Steerling-8B in bfloat16 requires approximately 18 GB VRAM. It fits on a single H100, A100 (40GB or 80GB), A6000 (48GB), or RTX 4090 (24GB).
Can I fine-tune this model?
Yes, but fine-tuning code is not included in this release. If there is sufficient interest, we will support it in a future release.
What tokenizer does Steerling-8B use?
OpenAI's cl100k_base tokenizer (via tiktoken) with 4 additional special tokens: <|pad|>, <|bos|>, <|endofchunk|>, and <|mask|>, for a total vocabulary of 100,281 tokens.
How do I get training data attributions?
This release supports concept and feature attributions via the provided notebooks. Training data attribution is not currently supported but will be added in a future release.

License

The Steerling source code is released under the Apache License 2.0.

The model weights are provided for research and evaluation purposes. The weights were trained on datasets with varying license terms, including Nemotron-CC-HQ and Dolmino Mix. Some training data includes synthetic content generated by third-party models with their own license terms. We are currently reviewing the implications of these upstream licenses for downstream use of the model weights. Please check back for updates.

For questions about commercial use of the model weights, contact us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
assets		assets
notebooks		notebooks
scripts		scripts
steerling		steerling
tests		tests
.env-template		.env-template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steerling

Quick Start

Model Details

Architecture

Interpretability

Installation

Evaluation

Notebooks

FAQ

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Steerling

Quick Start

Model Details

Architecture

Interpretability

Installation

Evaluation

Notebooks

FAQ

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages