Stratum is an ML system for efficiently executing large-scale agentic pipeline search. It integrates with MLE agents by representing batches of agent-generated pipelines as lazily evaluated DAGs, applying logical and runtime optimizations, and executing them across heterogeneous backends, including a Rust-based runtime. Stratum builds on skrub's operator abstraction and under active development.
- Provide seamless and unrestricted support for arbitrary ML libraries without operator porting.
- Enable lazy evaluation and provide operator semantics that enable logical rewrites and cost-based optimizations.
- Implement a runtime with efficient operator kernels (in Rust), scheduling across CPUs, GPUs, and distributed backends, plus runtime optimizations such as buffer pools, reuse of intermediates, and inter- and intra-operator parallelization.
For now, you need to build stratum from source.
Requirements:
- Python 3.10+
- skrub
- Rust toolchain (nightly not required; stable is fine)
- maturin (
pip install maturin)
From the repository root, install the extension in editable (development) mode:
maturin develop --releaseFor more details (including building wheels), see the Developer Instructions section below.
To leverage stratum, agent prompts or pipelines need minor changes. Prompts should be modified to generate code following skrub DataOps syntax.
Stratum can also significantly speed up human-written skrub code.
The following flags enable different features of Stratum. These flags can be set via environment variables or directly in code:
import stratum
stratum.set_config(
rust_backend=True,
scheduler=True,
stats=True,
debug_timing=False,
)import stratum as skrub #drop-in replacement
from sklearn.preprocessing import OneHotEncoder
from skrub.datasets import fetch_employee_salaries
from skrub import TableVectorizer, StringEncoder
def main():
# Load dataset
dataset = fetch_employee_salaries()
employees, salaries = dataset.X, dataset.y
employees = employees.dropna()
skrub.set_config(rust_backend=True, debug_timing=True, scheduler=True, stats=True) #stratum's config
vectorizer = TableVectorizer(high_cardinality=StringEncoder(), low_cardinality=OneHotEncoder())
employees_enc = vectorizer.fit_transform(employees)
print(f"Encoded data shape: {employees_enc.shape}")
if __name__ == "__main__":
main()stratum/
├─ pyproject.toml # Project metadata + Python/Rust build config (maturin)
├─ README.md
├─ LICENSE
├─ _rust/ # Rust crate (PyO3 extension)
│ ├─ Cargo.toml
│ └─ src/lib.rs # Defines #[pymodule] fn _rust_backend_native(...)
└─ stratum/ # Python package
├─ __init__.py # Façade over skrub + automatic patching
├─ _config.py # set_config/get_config + runtime/env sync
├─ _api.py # High-level grid search / evaluate helpers
├─ _rust_backend.py # Python <-> Rust shim (re-exports native fns)
├─ adapters/ # Public API (dispatch to Rust or fall back to skrub)
│ ├─ string_encoder.py # RustyStringEncoder
│ └─ one_hot_encoder.py # RustyOneHotEncoder
├─ logical_optimizer/ # DAG representation + logical rewrites
├─ runtime/ # Schedulers and runtime execution
├─ patching/ # Hooks that patch upstream skrub
└─ tests/ # Test suitematurin develop # Debug mode
maturin develop --release # Optimized dev buildThis produces redistributable .whl files under dist/.
# Linux / macOS
maturin build --release -o dist --interpreter python3.10 --compatibility linux
# Windows
maturin build --release -o distThen install with:
pip install ./dist/stratum-*.whlApache License 2.0. See LICENSE for details.