ℹ️ About

A scalable, self-hosted ML experiment tracker

Aim-compatible UI and SDK · FoundationDB · Kafka · Cloud Storage (S3, GCS, Azure)

About • Quick Start • Architecture • Repo • Deployment • Docs

ℹ️ About

Matyan (մատյան, book of records in Armenian) is a self-hosted ML experiment tracking stack forked from Aim. The backend is fully reimplemented on FoundationDB, Kafka, and Cloud Storage (S3, GCS, Azure) for horizontal scalability — while the original Aim React UI and Python client SDK API are preserved unchanged.

Matyan logs your training runs and any ML metadata, enables a beautiful UI to compare and observe them, and provides an SDK to query them programmatically.

Log Metadata Across Your ML Pipeline 💾	Visualize & Compare Metadata via UI 📊
Metrics, hyperparameters, images, audio, text, distributions Structured and terminal run logs Aim-compatible SDK — no code changes needed	Metadata visualization via explorers (metrics, images, audio, …) Grouping, aggregation, and subplots Querying using MatyanQL (Python expressions)
Scale to Thousands of Runs ⚡	Production-Ready Deployment 🚀
FoundationDB backend — handles 10,000s of runs Kafka-based ingestion pipeline with consumer workers Secondary indexes (Tier 1 + Tier 2 hparam) for fast queries	Helm chart for Kubernetes with all components Stateless, horizontally scalable services S3, GCS, or Azure Blob Storage for large artifact blobs

_{SEAMLESSLY INTEGRATES WITH:}

🏁 Quick Start

1. Start the infrastructure

./dev/compose-cluster.sh up -d

This starts FoundationDB, Kafka, and S3 (RustFS) locally via Docker Compose. (GCS and Azure backends are supported in production). Then start the backend, frontier, and UI from their package directories (see each component README for uv run commands).

2. Install the client

python3 -m pip install matyan-client

Or with uv:

uv add matyan-client

3. Log a training run

from matyan_client import Run

run = Run()

run["hparams"] = {
    "learning_rate": 0.001,
    "batch_size": 32,
}

for i in range(100):
    run.track(i * 0.01, name="loss", step=i, context={"subset": "train"})
    run.track(1 - i * 0.01, name="acc", step=i, context={"subset": "train"})

run.close()

The same Run API works as in Aim — see Supported types for images, audio, distributions, figures, and text.

Query runs programmatically via SDK

from matyan_client import Repo

repo = Repo("http://localhost:53800")

query = "metric.name == 'loss'"

for run_metrics_collection in repo.query_metrics(query).iter_runs():
    for metric in run_metrics_collection:
        params = metric.run[...]
        steps, values = metric.values.sparse_numpy()

Deploy on Kubernetes

helm install matyan deploy/helm/matyan -f deploy/helm/matyan/values-production.yaml

See deploy/helm/matyan/README.md for all values and production notes.

Read the full documentation at matyan-core/deployment 📖

🏗 Architecture

flowchart TB
    subgraph clients["Training Clients"]
        C["matyan-client"]
    end

    subgraph ui["UI"]
        U["matyan-ui"]
    end

    subgraph ingestion["Ingestion path"]
        F["matyan-frontier<br/>(Ingestion Gateway)"]
        K["Kafka<br/>data-ingestion"]
        IW["Ingestion Workers"]
        STR["Cloud Storage<br/>(S3 / GCS / Azure)"]
    end

    subgraph control["Control path"]
        B["matyan-backend<br/>(REST API)"]
        KC["Kafka<br/>control-events"]
        CW["Control Workers"]
    end

    subgraph storage["Storage"]
        FDB["FoundationDB"]
    end

    C -->|"WebSocket (metrics, hparams)"| F
    C -->|"PUT blob"| STR
    F -->|"blob ref"| K
    K --> IW
    IW --> FDB
    U --> B
    B --> FDB
    B --> KC
    KC --> CW
    CW -->|"cleanup"| STR

Concern	Entry point	Consistency
UI reads	matyan-backend	Immediate
UI control ops (delete, rename)	matyan-backend	Immediate for user, async for cleanup
Client metrics / hparams ingestion	frontier (WebSocket)	Eventual
Client blob upload (images, audio)	frontier (presigned URL)	Eventual

📁 Repo Layout

Path	Purpose
`extra/matyan-backend/`	REST API, FDB storage, ingestion/control Kafka workers, CLI. README
`extra/matyan-frontier/`	Ingestion gateway: WebSocket + presigned URLs (S3, GCS, Azure SAS); publishes to Kafka. README
`extra/matyan-ui/`	React frontend (from Aim) + Python wrapper for serving. README
`extra/matyan-client/`	Python client SDK (Aim-compatible API); connects to frontier and backend.
`extra/matyan-api-models/`	Shared Pydantic models (WS, Kafka, REST). README
`deploy/helm/matyan/`	Helm chart for Kubernetes. README
`dev/docker-compose.yml`	Local dev: FDB, Kafka, S3 (RustFS), optional app services.
`docs/`	MkDocs source for the documentation site.

🚢 Deployment

Local development

The fastest way to get everything running is Docker Compose. A single script starts all infrastructure dependencies and the Matyan services:

./dev/compose-cluster.sh up -d

This brings up:

Service	Port	Purpose
FoundationDB	—	Primary storage (internal)
Apache Kafka	9092	Ingestion + control event bus
RustFS (S3-compatible)	9000 / 9001	Blob artifact storage + console
`matyan-backend`	53800	REST API
`matyan-frontier`	53801	WebSocket ingestion gateway
`matyan-ui`	8000	React UI

Point your browser to http://localhost:8000 once all services are healthy. Use http://localhost:9001 for the RustFS console (credentials: rustfsadmin / rustfsadmin).

To seed demo data into a running stack:

cd extra/matyan-backend
uv run python scripts/seed_data.py seed

Kubernetes

Matyan ships a Helm chart covering all application services and their infrastructure dependencies (FoundationDB via the fdb-operator, Kafka, RustFS).

Prerequisites: a Kubernetes cluster (1.25+) with a default or named StorageClass.

Generate a Fernet key (required for encrypted blob URIs):

uvx --from cryptography python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Install the chart:

helm install matyan deploy/helm/matyan \
  -f deploy/helm/matyan/values-dev.yaml \
  --set ui.hostBase=https://matyan.example.com \
  --set backend.hostBase=https://matyan.example.com \
  --set blobUriSecret.value=<your-fernet-key> \
  --set fdb-cluster.processes.general.volumeClaimTemplate.storageClassName=<your-storage-class>

Scaling: all application services (matyan-backend, matyan-frontier, ingestion workers, control workers) are stateless. Scale any of them independently by adjusting replicaCount in the values file — FoundationDB and Kafka handle coordination automatically.

See deploy/helm/matyan/README.md for the full values reference and production configuration notes (TLS, resource limits, external Kafka/S3, multi-node FDB).

🆚 Comparisons to familiar tools

TensorBoard vs Matyan

Training run comparison

Tracked parameters are first-class citizens in Matyan. You can search, group, and aggregate by params — deeply exploring all tracked data (metrics, hyperparameters, images, audio) in the UI.
With TensorBoard, users are forced to encode parameters into the run name to search and compare them. TensorBoard has no grouping, aggregation, or subplot features.

Scalability

Matyan is built on FoundationDB and Kafka to handle 10,000s of training runs at both the storage and UI layers.
TensorBoard becomes slow and hard to use when a few hundred training runs are queried or compared.

MLflow vs Matyan

MLflow is an end-to-end ML lifecycle tool. Matyan is focused on training tracking and observability.

Run comparison

Matyan treats tracked parameters as first-class citizens. Users can query runs, metrics, images, and filter using params with full grouping, aggregation, and subplotting.
MLflow has basic search by config but lacks grouping, aggregation, and rich comparison features.

UI scalability

Matyan's UI handles thousands of metrics with thousands of steps smoothly.
MLflow's UI slows noticeably with a few hundred runs.

Deployment

Both are self-hosted and open-source.
Matyan adds a Kafka-based ingestion pipeline and FoundationDB for high-throughput, horizontally scalable deployments.

Weights and Biases vs Matyan

Hosted vs self-hosted

Weights and Biases is a hosted, closed-source MLOps platform. Your experiment data lives on their servers.
Matyan is fully self-hosted and open-source — your data stays in your own infrastructure (FoundationDB + S3/GCS/Azure).

Cost

W&B charges per seat / usage at scale.
Matyan is free; you only pay for your own compute and storage.

Aim vs Matyan

Matyan is a fork of Aim. The UI and Python SDK API surface are almost identical — minor code changes needed to switch.

Storage backend

Aim uses an embedded RocksDB store (custom Cython extensions) on a single node. Storage is tied to the machine running aim up.
Matyan replaces RocksDB with FoundationDB — a distributed, ACID-compliant key-value store designed for horizontal scaling. All runs share a single logical key space across a cluster.

Ingestion pipeline

Aim writes tracking data synchronously in the same process as the server.
Matyan routes tracking data through Kafka → ingestion workers → FoundationDB, decoupling the write path from the API. The frontier service can handle bursts from many concurrent training jobs without backpressure on the API.

Deployment model

Aim is a single aim up process — simple to start, harder to scale.
Matyan is a set of stateless, horizontally scalable microservices (backend API, frontier, ingestion workers, control workers) deployable on Kubernetes via Helm.

When to use Aim

Aim is a great choice for individual researchers running experiments on a single machine where simplicity matters more than scale.

When to use Matyan

Matyan is the right choice when you need to scale to many concurrent training jobs, many users, or large run counts — while keeping the familiar Aim UI and SDK.

📦 Component READMEs

Matyan Backend — REST API, FDB storage, workers, config, deployment.
Matyan Frontier — Ingestion gateway, WebSocket, presigned URLs (S3/GCS/Azure).
Matyan UI — Frontend build, serve, and environment variables.
Matyan API Models — Shared Pydantic models.
Helm Chart — Kubernetes deployment and configuration.

📬 Contact

Questions, feedback, or collaboration? Reach out at [email protected].

⚖️ License

Apache 2.0 — see LICENSE.

Matyan is a fork of Aim by AimStack, used under the Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.cursor/rules		.cursor/rules
.github		.github
deploy		deploy
dev		dev
dockerhub-docs		dockerhub-docs
docs		docs
extra		extra
gh-pages @ 7e0af55		gh-pages @ 7e0af55
scripts		scripts
tests/integration		tests/integration
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
fdb.cluster		fdb.cluster
mkdocs.yml		mkdocs.yml
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml
release.yaml		release.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A scalable, self-hosted ML experiment tracker

About • Quick Start • Architecture • Repo • Deployment • Docs

ℹ️ About

🏁 Quick Start

1. Start the infrastructure

2. Install the client

3. Log a training run

🏗 Architecture

📁 Repo Layout

🚢 Deployment

Local development

Kubernetes

🆚 Comparisons to familiar tools

📦 Component READMEs

📬 Contact

⚖️ License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A scalable, self-hosted ML experiment tracker

About • Quick Start • Architecture • Repo • Deployment • Docs

ℹ️ About

🏁 Quick Start

1. Start the infrastructure

2. Install the client

3. Log a training run

🏗 Architecture

📁 Repo Layout

🚢 Deployment

Local development

Kubernetes

🆚 Comparisons to familiar tools

📦 Component READMEs

📬 Contact

⚖️ License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages