Skip to content

sparkrun

Launch, manage, and stop LLM inference workloads on one or more NVIDIA DGX Spark systems — no Slurm, no Kubernetes, no fuss.

Get Running in 3 Steps

1

Install sparkrun

One command installs sparkrun, sets up a managed environment, configures tab completion, and starts the setup wizard.

uvx sparkrun setup
2

Configure your cluster(s)

Setup your DGX Spark or Sparks with our best practices by following the wizard.

3

Run inference

Pick a recipe, launch it. Your model is serving — Ctrl+C safely detaches from logs.

sparkrun run qwen3-1.7b-vllm

One Command Launch

Pick a recipe, run it. sparkrun handles container orchestration, model distribution, and networking automatically.

🔗

Multi-Node Tensor Parallel

Scale across DGX Sparks with --tp. Each Spark contributes one GPU — sparkrun handles InfiniBand/RDMA and NCCL configuration.

📋

Recipe System

YAML configs capture model, container, runtime, and defaults. Override anything at launch time — no config files to hunt down.

📦

Recipe Registries

Share and collaborate on recipes via git registries. Add community or private registries and search across all of them.

📊

VRAM Estimation

Auto-detects model architecture from HuggingFace. Know whether your config fits on a single DGX Spark or how many you need before launching.

🔄

Multiple Runtimes

First-class support for vLLM, SGLang, and llama.cpp. Same CLI, same recipe format, different engines under the hood.

🤖

Claude Code Plugin

AI-assisted inference management. Claude learns your cluster and helps run, monitor, and stop workloads conversationally.

⌨️

CLI Tab Completion

Rich shell completions for Bash, Zsh, and Fish. Tab-complete commands, recipe names, cluster names, and options instantly.