Diffusion-based Mamba Architecture for Non-Autoregressive Text Generation
DIMBA is a research-grade language model that combines the power of diffusion models with Mamba-2 State Space Models (SSM) to enable fast, parallel text generation. Unlike traditional autoregressive models that generate tokens one-by-one, DIMBA generates entire sequences simultaneously through iterative denoising.
🔬 Research Paper: "DIMBA: Revolutionizing Theoretical Ultra-Fast Inference and Advanced Reasoning with Mamba-Based Diffusion" — Faris Allafi (2025)
🌐 Website: dimbalabs.xyz
👤 Author: farisallafi.xyz
- No CUDA dependencies required — runs on CPU, GPU, and Apple Silicon
- Custom
SimpleMamba2fallback implementation whenmamba-ssmis unavailable - Seamlessly switches between high-performance CUDA kernels and pure PyTorch
- Optional Variational Autoencoder for compressing token embeddings
- Trainable latent spaces with KL-regularization (β-VAE)
- Improves diffusion efficiency and model capacity
- First-class Metal Performance Shaders support
- Optimized for M1/M2/M3 Macs without CUDA
train_interactive.py— guided wizard for easy configuration- Automatic hardware detection and optimization recommendations
- One-command training for various GPU tiers (A4000, L40S, etc.)
- Standard diffusion sampling — flexible step counts
- DDIM sampling — faster inference with fewer steps
- Consistency training (CDLM) — up to 14× faster inference
- Top-k, top-p, and temperature-based sampling
┌─────────────────────────────────────────────────────────────┐
│ DIMBA Architecture │
├─────────────────────────────────────────────────────────────┤
│ Input Tokens │
│ ↓ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Token │───→│ Prompt │───→│ Conditioning │ │
│ │ Embeddings │ │ Encoder │ │ (C) │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
│ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Latent Projection (Optional VAE) │ │
│ │ z = μ + σ·ε (reparameterization trick) │ │
│ └─────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Cosine Noise Schedule │ │
│ │ ᾱ(t) = cos²((t/T + s)/(1+s)·π/2) │ │
│ │ x_t = √ᾱ(t)·x₀ + √(1-ᾱ(t))·ε │ │
│ └─────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Mamba-2 Denoiser (T iterations) │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Mamba-2 SSM Block × N layers │ │ │
│ │ │ - Linear-time sequence processing │ │ │
│ │ │ - Selective state spaces (S6) │ │ │
│ │ │ - FiLM/Additive conditioning │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Output │───→│ Latent │───→│ Token Logits │ │
│ │ Projection │ │ Decode │ │ (Softmax) │ │
│ └─────────────┘ └─────────────┘ └─────────────────┘ │
│ ↓ │
│ Generated Text │
└─────────────────────────────────────────────────────────────┘
| Component | Description |
|---|---|
| Token Embeddings | Learnable embeddings mapping discrete tokens to continuous space |
| Prompt Encoder | Lightweight MLP for conditioning on prefix tokens |
| Noise Schedule | Cosine schedule following Nichol & Dhariwal (2021) |
| Timestep Embeddings | Sinusoidal encodings with MLP projection |
| Mamba-2 Denoiser | Stack of SSM blocks with FiLM/additive conditioning |
| VAE (Optional) | Token-level variational autoencoder for latent diffusion |
# Clone the repository
git clone https://github.com/devnull37/dimba-lib-exp.git
cd dimba-lib-exp
# Basic installation (CPU + SimpleMamba fallback)
pip install -e .
# With GPU support (full Mamba-2 with CUDA)
pip install -e ".[gpu]"
# Full development setup (includes all extras)
pip install -e ".[all]"# Launch the interactive training wizard
python scripts/train_interactive.pyThe wizard will guide you through:
- Hardware detection (CUDA, MPS, or CPU)
- Model size selection
- Dataset configuration
- Training hyperparameters
# Train on GPU
python scripts/train.py --config config.yaml --gpus 1 --max-epochs 10
# Train on CPU (uses SimpleMamba)
python scripts/train.py --config config.yaml
# Train on Apple Silicon
python scripts/train.py --config config.yaml --mpsimport torch
from dimba import DIMBA, sample_from_model
# Create a DIMBA model
model = DIMBA(
vocab_size=50000,
d_model=512,
num_diffusion_steps=1000,
num_denoiser_layers=8,
)
# Generate text
prompt_ids = torch.tensor([[10, 20, 30]]) # Tokenized prompt
generated = sample_from_model(
model,
prompt_ids,
seq_len=100,
num_steps=50, # Fewer steps = faster, more steps = better quality
temperature=1.0,
top_p=0.95
)
print(generated)| Platform | Status | Notes |
|---|---|---|
| NVIDIA CUDA | ✅ Full support | Best performance with mamba-ssm>=2.2.0 |
| Apple Silicon (MPS) | ✅ Full support | Native Metal backend for M1/M2/M3 |
| CPU | ✅ Supported | Uses pure PyTorch SimpleMamba2 fallback |
| AMD ROCm | Via PyTorch ROCm builds |
# RTX A4000 (16GB VRAM) - 500M parameter model
python scripts/train_fineweb_500m_a4000.py
# L40S / A100 - 1.5B parameter model
python scripts/train_fineweb_1b.py
# CDLM (Consistency Training) - up to 14× faster inference
python scripts/train_cdlm.pyPre-train a Variational Autoencoder to compress token embeddings:
# Basic VAE training
python scripts/train_vae.py \
--dataset wikitext \
--dataset-config wikitext-2-raw-v1 \
--latent-dim 256 \
--kl-weight 1.0 \
--epochs 10Use the pre-trained VAE in DIMBA:
model = DIMBA(
vocab_size=50000,
d_model=512,
latent_diffusion=True,
d_latent=256,
use_vae_latent=True,
vae_checkpoint_path='checkpoints/vae/final.ckpt',
)Train with Consistency Models for ultra-fast inference:
python scripts/train_cdlm.py \
--config config.yaml \
--consistency-weight 0.5 \
--delta-min 50 \
--delta-max 200- Core diffusion training pipeline
- Mamba-2 denoiser with FiLM conditioning
- Pure PyTorch SimpleMamba2 fallback
- VAE-based latent diffusion
- DDIM sampling for faster inference
- Interactive training wizard
- Multi-GPU training (PyTorch Lightning)
- Apple Silicon (MPS) support
- HuggingFace datasets integration
- BPE tokenization
- EMA (Exponential Moving Average) training
- Checkpointing and resumption
- Consistency model training (CDLM)
- Multi-modal extensions
- Quantization support (INT8, INT4)
- ONNX export
- Flash Attention integration
- Rotary Position Embeddings (RoPE)
- Training cost: Diffusion models require substantial compute for pre-training
- Discrete-continuous gap: Mapping between discrete tokens and continuous embeddings affects rare token handling
- Hyperparameter sensitivity: Performance varies significantly with diffusion steps (T), architecture depth
- Conditioning robustness: Long-context conditioning requires careful tuning
dimba-lib-exp/
├── src/dimba/ # Core library
│ ├── models/ # Model implementations
│ │ ├── diffusion.py # Main DIMBA model
│ │ ├── denoiser.py # Mamba-2 denoiser
│ │ ├── vae.py # Token VAE
│ │ ├── embeddings.py # Embedding layers
│ │ └── simple_mamba.py # Pure PyTorch Mamba
│ ├── diffusion/ # Diffusion utilities
│ │ ├── schedules.py # Noise schedules
│ │ └── sampling.py # Sampling algorithms
│ ├── data/ # Dataset loaders
│ ├── training/ # Training utilities
│ ├── evaluation/ # Metrics (BLEU, ROUGE, etc.)
│ └── tokenizers/ # Tokenization
├── scripts/ # Training & utility scripts
│ ├── train_interactive.py # Interactive wizard ⭐
│ ├── train.py # Generic training
│ ├── train_vae.py # VAE pre-training
│ ├── train_cdlm.py # Consistency training
│ ├── generate.py # Text generation
│ ├── evaluate.py # Evaluation
│ └── setup/ # Installation scripts
├── configs/ # Configuration files
├── tests/ # Unit tests
├── notebooks/ # Jupyter notebooks
├── paper/ # Research paper
└── docs/ # Documentation
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Install development dependencies:
pip install -e ".[dev]" - Make your changes
- Run tests:
pytest - Format code:
black src/ && isort src/ - Submit a Pull Request
pip install -e ".[all]"
pre-commit install # Optional: for automated formattingIf you use DIMBA in your research, please cite:
@article{allafi2025dimba,
title={DIMBA: Revolutionizing Theoretical Ultra-Fast Inference and Advanced Reasoning with Mamba-Based Diffusion},
author={Allafi, Faris},
year={2025}
}This project is licensed under the MIT License — see the LICENSE file for details.
- 🌐 Website: dimbalabs.xyz
- 👤 Author: farisallafi.xyz
- 📄 Paper: Available in the
paper/directory - 💻 Repository: github.com/devnull37/dimba-lib-exp
- 🐛 Issues: GitHub Issues
- Mamba — State Space Models by Tri Dao and Albert Gu
- Diffusion Models — Inspired by works from OpenAI, Google Research, and the broader diffusion community
- PyTorch Lightning — For the excellent training framework
- HuggingFace — For datasets and transformers infrastructure
Built with ❤️ by Faris Allafi