About • Quick Start • Architecture • Repo • Deployment • Docs
Matyan (մատյան, book of records in Armenian) is a self-hosted ML experiment tracking stack forked from Aim. The backend is fully reimplemented on FoundationDB, Kafka, and Cloud Storage (S3, GCS, Azure) for horizontal scalability — while the original Aim React UI and Python client SDK API are preserved unchanged.
Matyan logs your training runs and any ML metadata, enables a beautiful UI to compare and observe them, and provides an SDK to query them programmatically.
| Log Metadata Across Your ML Pipeline 💾 | Visualize & Compare Metadata via UI 📊 |
|---|---|
|
|
| Scale to Thousands of Runs ⚡ | Production-Ready Deployment 🚀 |
|
|
./dev/compose-cluster.sh up -dThis starts FoundationDB, Kafka, and S3 (RustFS) locally via Docker Compose. (GCS and Azure backends are supported in production).
Then start the backend, frontier, and UI from their package directories (see each component README for uv run commands).
python3 -m pip install matyan-clientOr with uv:
uv add matyan-clientfrom matyan_client import Run
run = Run()
run["hparams"] = {
"learning_rate": 0.001,
"batch_size": 32,
}
for i in range(100):
run.track(i * 0.01, name="loss", step=i, context={"subset": "train"})
run.track(1 - i * 0.01, name="acc", step=i, context={"subset": "train"})
run.close()The same Run API works as in Aim — see Supported types for images, audio, distributions, figures, and text.
Query runs programmatically via SDK
from matyan_client import Repo
repo = Repo("http://localhost:53800")
query = "metric.name == 'loss'"
for run_metrics_collection in repo.query_metrics(query).iter_runs():
for metric in run_metrics_collection:
params = metric.run[...]
steps, values = metric.values.sparse_numpy()Deploy on Kubernetes
helm install matyan deploy/helm/matyan -f deploy/helm/matyan/values-production.yamlSee deploy/helm/matyan/README.md for all values and production notes.
Read the full documentation at matyan-core/deployment 📖
flowchart TB
subgraph clients["Training Clients"]
C["matyan-client"]
end
subgraph ui["UI"]
U["matyan-ui"]
end
subgraph ingestion["Ingestion path"]
F["matyan-frontier<br/>(Ingestion Gateway)"]
K["Kafka<br/>data-ingestion"]
IW["Ingestion Workers"]
STR["Cloud Storage<br/>(S3 / GCS / Azure)"]
end
subgraph control["Control path"]
B["matyan-backend<br/>(REST API)"]
KC["Kafka<br/>control-events"]
CW["Control Workers"]
end
subgraph storage["Storage"]
FDB["FoundationDB"]
end
C -->|"WebSocket (metrics, hparams)"| F
C -->|"PUT blob"| STR
F -->|"blob ref"| K
K --> IW
IW --> FDB
U --> B
B --> FDB
B --> KC
KC --> CW
CW -->|"cleanup"| STR
| Concern | Entry point | Consistency |
|---|---|---|
| UI reads | matyan-backend | Immediate |
| UI control ops (delete, rename) | matyan-backend | Immediate for user, async for cleanup |
| Client metrics / hparams ingestion | frontier (WebSocket) | Eventual |
| Client blob upload (images, audio) | frontier (presigned URL) | Eventual |
| Path | Purpose |
|---|---|
extra/matyan-backend/ |
REST API, FDB storage, ingestion/control Kafka workers, CLI. README |
extra/matyan-frontier/ |
Ingestion gateway: WebSocket + presigned URLs (S3, GCS, Azure SAS); publishes to Kafka. README |
extra/matyan-ui/ |
React frontend (from Aim) + Python wrapper for serving. README |
extra/matyan-client/ |
Python client SDK (Aim-compatible API); connects to frontier and backend. |
extra/matyan-api-models/ |
Shared Pydantic models (WS, Kafka, REST). README |
deploy/helm/matyan/ |
Helm chart for Kubernetes. README |
dev/docker-compose.yml |
Local dev: FDB, Kafka, S3 (RustFS), optional app services. |
docs/ |
MkDocs source for the documentation site. |
The fastest way to get everything running is Docker Compose. A single script starts all infrastructure dependencies and the Matyan services:
./dev/compose-cluster.sh up -dThis brings up:
| Service | Port | Purpose |
|---|---|---|
| FoundationDB | — | Primary storage (internal) |
| Apache Kafka | 9092 | Ingestion + control event bus |
| RustFS (S3-compatible) | 9000 / 9001 | Blob artifact storage + console |
matyan-backend |
53800 | REST API |
matyan-frontier |
53801 | WebSocket ingestion gateway |
matyan-ui |
8000 | React UI |
Point your browser to http://localhost:8000 once all services are healthy. Use http://localhost:9001 for the RustFS console (credentials: rustfsadmin / rustfsadmin).
To seed demo data into a running stack:
cd extra/matyan-backend
uv run python scripts/seed_data.py seedMatyan ships a Helm chart covering all application services and their infrastructure dependencies (FoundationDB via the fdb-operator, Kafka, RustFS).
Prerequisites: a Kubernetes cluster (1.25+) with a default or named StorageClass.
Generate a Fernet key (required for encrypted blob URIs):
uvx --from cryptography python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"Install the chart:
helm install matyan deploy/helm/matyan \
-f deploy/helm/matyan/values-dev.yaml \
--set ui.hostBase=https://matyan.example.com \
--set backend.hostBase=https://matyan.example.com \
--set blobUriSecret.value=<your-fernet-key> \
--set fdb-cluster.processes.general.volumeClaimTemplate.storageClassName=<your-storage-class>Scaling: all application services (matyan-backend, matyan-frontier, ingestion workers, control workers) are stateless. Scale any of them independently by adjusting replicaCount in the values file — FoundationDB and Kafka handle coordination automatically.
See deploy/helm/matyan/README.md for the full values reference and production configuration notes (TLS, resource limits, external Kafka/S3, multi-node FDB).
TensorBoard vs Matyan
Training run comparison
- Tracked parameters are first-class citizens in Matyan. You can search, group, and aggregate by params — deeply exploring all tracked data (metrics, hyperparameters, images, audio) in the UI.
- With TensorBoard, users are forced to encode parameters into the run name to search and compare them. TensorBoard has no grouping, aggregation, or subplot features.
Scalability
- Matyan is built on FoundationDB and Kafka to handle 10,000s of training runs at both the storage and UI layers.
- TensorBoard becomes slow and hard to use when a few hundred training runs are queried or compared.
MLflow vs Matyan
MLflow is an end-to-end ML lifecycle tool. Matyan is focused on training tracking and observability.
Run comparison
- Matyan treats tracked parameters as first-class citizens. Users can query runs, metrics, images, and filter using params with full grouping, aggregation, and subplotting.
- MLflow has basic search by config but lacks grouping, aggregation, and rich comparison features.
UI scalability
- Matyan's UI handles thousands of metrics with thousands of steps smoothly.
- MLflow's UI slows noticeably with a few hundred runs.
Deployment
- Both are self-hosted and open-source.
- Matyan adds a Kafka-based ingestion pipeline and FoundationDB for high-throughput, horizontally scalable deployments.
Weights and Biases vs Matyan
Hosted vs self-hosted
- Weights and Biases is a hosted, closed-source MLOps platform. Your experiment data lives on their servers.
- Matyan is fully self-hosted and open-source — your data stays in your own infrastructure (FoundationDB + S3/GCS/Azure).
Cost
- W&B charges per seat / usage at scale.
- Matyan is free; you only pay for your own compute and storage.
Aim vs Matyan
Matyan is a fork of Aim. The UI and Python SDK API surface are almost identical — minor code changes needed to switch.
Storage backend
- Aim uses an embedded RocksDB store (custom Cython extensions) on a single node. Storage is tied to the machine running
aim up. - Matyan replaces RocksDB with FoundationDB — a distributed, ACID-compliant key-value store designed for horizontal scaling. All runs share a single logical key space across a cluster.
Ingestion pipeline
- Aim writes tracking data synchronously in the same process as the server.
- Matyan routes tracking data through Kafka → ingestion workers → FoundationDB, decoupling the write path from the API. The frontier service can handle bursts from many concurrent training jobs without backpressure on the API.
Deployment model
- Aim is a single
aim upprocess — simple to start, harder to scale. - Matyan is a set of stateless, horizontally scalable microservices (backend API, frontier, ingestion workers, control workers) deployable on Kubernetes via Helm.
When to use Aim
Aim is a great choice for individual researchers running experiments on a single machine where simplicity matters more than scale.
When to use Matyan
Matyan is the right choice when you need to scale to many concurrent training jobs, many users, or large run counts — while keeping the familiar Aim UI and SDK.
- Matyan Backend — REST API, FDB storage, workers, config, deployment.
- Matyan Frontier — Ingestion gateway, WebSocket, presigned URLs (S3/GCS/Azure).
- Matyan UI — Frontend build, serve, and environment variables.
- Matyan API Models — Shared Pydantic models.
- Helm Chart — Kubernetes deployment and configuration.
Questions, feedback, or collaboration? Reach out at [email protected].
Apache 2.0 — see LICENSE.
Matyan is a fork of Aim by AimStack, used under the Apache 2.0 license.















