Start describing what you want. Let CommandGraph chart the course.
CommandGraph is an infrastructure tool that runs shell commands in the right order, skips what's already done, and recovers from crashes. You write a plain-text file that reads like English (or use an agent on your behalf!). The engine builds a dependency graph, parallelizes what it can, and executes over SSH or locally.
One Python file with zero dependencies. No agents on your servers. No daemon. No database.
--- Deploy my app ---
target "web" ssh [email protected]:
[install nginx] as root:
skip if $ command -v nginx
run $ apt-get install -y nginx
[write site config] as root:
first [install nginx]
content > /etc/nginx/sites-available/myapp:
server {
listen 80;
server_name myapp.example.com;
root /var/www/myapp;
index index.html;
location / { try_files $uri $uri/ =404; }
}
validate $ nginx -t
[enable site] as root:
first [write site config]
skip if $ test -L /etc/nginx/sites-enabled/myapp
run $ ln -sf /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp
[deploy code]:
first [install nginx]
skip if $ test -f /var/www/myapp/index.html
run $ mkdir -p /var/www/myapp && echo "<h1>Hello World</h1>" > /var/www/myapp/index.html
[start nginx] as root:
first [enable site], [deploy code]
skip if $ systemctl is-active nginx
run $ systemctl reload-or-restart nginx
verify "site is live":
first [start nginx]
run $ curl -sf http://localhost/
retry 3x wait 2s
That's a complete, runnable deployment. [brackets] name your steps. first declares what must happen before. skip if makes it idempotent. content > writes config files with automatic validate and rollback. verify is your smoke test.
cgr serve FILE launches a browser-based IDE with a live DAG visualization and execution panel. The left pane is an editor; the right pane shows the dependency graph updating in real time as you edit. Run apply, stream step output, inspect state and history, and view collected report data -- all from the browser.
Point it at any .cgr file on your machine:
# opens http://localhost:8080 with live editing + graph
cgr serve mysetup.cgr# Copy one file. That's it.
curl -O https://raw.githubusercontent.com/commandgraph/cgr/main/cgr.py
chmod +x cgr.py
sudo mv cgr.py /usr/local/bin/cgr
# Or just run it directly
python3 cgr.py apply mysetup.cgrNo pip install. No virtualenv. No dependencies. Python 3.9+ only.
If you want to see CommandGraph work before touching a server, use the container demo suite in testing/. It runs entirely locally, uses local files and disposable stub services instead of real infrastructure, and gives a new user a fast way to watch the engine plan, execute, fail, resume, and detect drift.
# 1. Clone the repo and enter it
git clone <repository-url>
cd commandgraph
# 2. Run the local demos
cd testing
./run-demos.sh list # see the 10 demos
./run-demos.sh 1 # quick first demo: plan -> apply -> idempotent re-run
./run-demos.sh 3 # crash recovery and resume
./run-demos.sh # run the full suiteWhat this gives you:
- No SSH targets, cloud accounts, or real services required
- A disposable container image with
cgr, example graphs, and the template repo preloaded - Narrated demos for validation, execution, templates, crash recovery, parallelism, race, drift detection, HTTP/reporting, CLI tooling, and state isolation
If you want to explore interactively:
cd testing
./run-demos.sh shellInside that shell, cgr is already on PATH, examples live in /opt/cgr/examples, and the repo is at /opt/cgr/repo.
Those tools are good and widely used. CommandGraph is not intended to replace or replicate them. It sits above, bridging things tightly together.
How does CommandGraph compare to, say Ansible?
| Scenario | Ansible | CommandGraph |
|---|---|---|
| Set up a new server | Write a playbook, install Ansible, configure inventory, install collections | Write a .cgr file, copy one Python file, run it |
| See what will happen | --check (unreliable for shell/command) |
cgr plan shows exact execution waves |
| Run steps in parallel | Set forks globally, serial per play, hope for the best |
Automatic. Independent steps run concurrently. Or use parallel, each, race for explicit control |
| Resume after failure | Re-run entire playbook, skip with tags or --start-at-task |
cgr apply resumes from exactly where it stopped |
| Detect drift | Write a separate check playbook | cgr state test re-runs checks, reports what changed |
| Deploy to air-gapped server | Install Ansible + deps on control node | scp cgr.py and go |
| Understand the dependency graph | Read the YAML top to bottom, hope the ordering is right | cgr visualize generates an interactive HTML DAG |
The deeper difference: Ansible executes tasks in the order you wrote them. CommandGraph builds a dependency graph and figures out the order for you. You declare what depends on what and the engine maximizes parallelism automatically.
Already have Ansible playbooks? Run them as steps inside a CommandGraph. This lets you sequence playbooks alongside shell commands, API calls, and other tools -- with crash recovery, dependency ordering, and parallel execution that Ansible alone can't express:
--- Provision and configure with Ansible ---
set env = "staging"
target "control" local:
[provision infra]:
run $ terraform apply -auto-approve -var="env=${env}"
timeout 10m
[run base playbook]:
first [provision infra]
skip if $ ansible -i inventory/${env} all -m ping | grep -q SUCCESS
run $ ansible-playbook -i inventory/${env} playbooks/base.yml
timeout 15m, retry 1x wait 30s
[run app playbook]:
first [run base playbook]
run $ ansible-playbook -i inventory/${env} playbooks/app.yml --tags deploy
timeout 10m
[smoke test]:
first [run app playbook]
get "https://${env}.example.com/health"
expect 200
retry 5x wait 10s
verify "fleet is healthy":
first [smoke test]
run $ ansible -i inventory/${env} all -m shell -a 'systemctl is-active myapp'
Terraform provisions, Ansible configures, CommandGraph orchestrates -- with resume from any failure point. You can also use Ansible inventory files directly with inventory "hosts.ini" (see Ansible inventory compatibility).
The engine reads your file, builds a dependency graph, and groups independent steps into parallel waves:
Steps with no dependency between them run in the same wave simultaneously. Steps that depend on others wait for their prerequisites. You didn't have to think about this -- the engine figured it out from your first declarations.
Every completed step is written to a .state file atomically. Crash mid-run, fix the problem, run again. Completed steps skip from state without even SSHing to the server.
Need isolated journals for concurrent parameterized runs? Use cgr apply FILE --run-id canary to salt the default state path, or cgr apply FILE --state /path/to/run.state to pin an explicit journal.
When a step fails, the engine automatically shows the command and stderr. No re-running with -v to figure out what went wrong.
CommandGraph ships with two reporting layers. You can collect stdout from specific steps for audit-style output, and you can also ask apply to write a machine-readable run summary for CI or archival.
Mark any step with collect "key" and its stdout is saved after execution:
--- Audit a host ---
target "web-1" ssh [email protected]:
[hostname]:
run $ hostname
collect "hostname"
[kernel]:
run $ uname -r
collect "kernel"
[disk]:
run $ df -h /
collect "disk_usage"
After cgr apply audit.cgr, view or export the collected data:
cgr report audit.cgr
cgr report audit.cgr --format json
cgr report audit.cgr --format csv -o audit.csv
cgr report audit.cgr --keys hostname,kernelFor multi-node graphs, cgr report turns collected keys into columns, which makes fleet audits and inventory snapshots easy to export.
If you want a run-level execution summary instead, cgr apply FILE --report run.json writes JSON containing wall-clock timing, per-step statuses, provenance, dedup information, and any collected outputs.
Point a target at an SSH host and every command runs remotely. State stays on your machine. No agent or runtime needed on the server -- just SSH access.
Multiple targets in one file run in parallel. Steps with as root are automatically wrapped in sudo on the remote side.
Four constructs for explicit concurrency, all composable with everything else:
[build everything]:
parallel 2 at a time:
[compile frontend]: run $ npm run build
[compile backend]: run $ cargo build --release
[build docs]: run $ mkdocs build
[download package]:
race into pkg.tar.gz:
[us mirror]: run $ curl -sf https://us.example.com/pkg.tar.gz -o ${_race_out}
[eu mirror]: run $ curl -sf https://eu.example.com/pkg.tar.gz -o ${_race_out}
Each branch writes to its own temp file. The winner is atomically renamed. No clobbering.
set servers = "web-1,web-2,web-3,web-4"
[deploy to fleet]:
each server in ${servers}, 3 at a time:
[deploy to ${server}]:
run $ ssh ${server} '/opt/activate.sh'
[rolling deploy]:
stage "production":
phase "canary" 1 from ${servers}:
[deploy ${server}]: run $ activate.sh
verify "healthy": run $ curl -sf http://${server}/health
retry 10x wait 3s
phase "rest" remaining from ${servers}:
each server, 4 at a time:
[deploy ${server}]: run $ activate.sh
The canary deploys to 1 server. Its verify must pass before the rest begin. If unhealthy, the rollout stops.
Write configs, edit lines, manage INI/JSON files -- all with built-in validation:
[write nginx config]:
content > /etc/nginx/sites-available/myapp:
server {
listen 80;
server_name example.com;
}
validate $ nginx -t
[harden sshd]:
line "PermitRootLogin no" in /etc/ssh/sshd_config, replacing "^#?PermitRootLogin"
line "PasswordAuthentication no" in /etc/ssh/sshd_config, replacing "^#?PasswordAuthentication"
validate $ sshd -t
[tune postgres]:
ini /etc/postgresql/14/main/postgresql.conf:
shared_buffers = "256MB"
max_connections = "200"
Writes are atomic. validate runs after the write; if it fails, the file is reverted.
Inline content > and block in bodies preserve literal # characters, so config comments stay intact.
Call APIs directly -- no curl piping, no shell escaping:
[register host]:
post "${api_host}/hosts"
auth bearer "${api_token}"
body json '{"hostname": "web-1", "status": "active"}'
expect 200..299
collect "registration"
Supports get, post, put, patch, delete. Auth tokens are automatically redacted from output. On SSH targets, requests execute via curl.
44 standard templates across 21 categories -- packages, containers, TLS, firewalls, databases, monitoring, backups, and more. Here's a production-grade nginx + certbot deployment that uses five of them:
--- Full-stack Nginx + TLS deployment ---
using apt/install_package, firewall/allow_port, systemd/enable_service, tls/certbot, nginx/vhost
set domain = "app.example.com"
set ssh_user = "deploy"
set ssh_host = "10.0.1.5"
target "web-1" ssh ${ssh_user}@${ssh_host}:
[install web packages] from apt/install_package:
name = "nginx curl"
[open http] from firewall/allow_port:
port = "80"
[open https] from firewall/allow_port:
port = "443"
[get tls cert] from tls/certbot:
domain = "${domain}"
email = "[email protected]"
[configure vhost] from nginx/vhost:
domain = "${domain}"
port = "443"
doc_root = "/var/www/${domain}"
[deploy app files] as root:
first [install web packages], [configure vhost]
skip if $ test -f /var/www/${domain}/index.html
run $ echo '<h1>${domain} is live</h1>' > /var/www/${domain}/index.html
[start nginx] as root, if fails stop:
first [deploy app files], [get tls cert], [open https], [open http]
[write ssl params] as root:
skip if $ test -f /etc/nginx/snippets/ssl-params.conf
content > /etc/nginx/snippets/ssl-params.conf:
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers HIGH:!aNULL:!MD5;
validate $ nginx -t
first [write ssl params]
skip if $ systemctl is-active nginx | grep -q active
run $ systemctl reload-or-restart nginx
[enable on boot] from systemd/enable_service:
service = "nginx"
verify "HTTPS 200 on ${domain}":
first [start nginx], [enable on boot]
run $ curl -sfk -o /dev/null -w '%{http_code}' https://${domain}/ | grep -q 200
retry 3x wait 2s
Templates are .cgr files in the repo/ directory. Each one declares its parameters, version, and description. Write your own by dropping a file in the right category. No Galaxy. No collections. Just files.
Categories include: apt, dnf, nginx, tls, firewall, systemd, service, docker, k8s, user, ssh, security, file, backup, db, monitoring, webhook, cron, and pkg.
[install packages (apt)]:
when os_family == "debian"
run $ apt-get install -y nginx
[install packages (yum)]:
when os_family == "redhat"
run $ yum install -y nginx
[detect pigz]:
run $ command -v pigz
on success: set compressor = "pigz"
on failure: set compressor = "gzip"
if fails ignore
[compress]:
first [detect pigz]
run $ ${compressor} archive.tar
Override anything at runtime: cgr apply --set os_family=redhat --set version=2.5.0
cgr secrets create vault.enc # create encrypted vault
cgr secrets edit vault.enc # edit in $EDITORsecrets "vault.enc"
target "db" ssh [email protected]:
[configure db]:
run $ echo "${db_password}" | psql -c "ALTER USER app PASSWORD '$(cat)'"
Secrets are decrypted at runtime, never written to disk, and auto-redacted from all output.
Already have inventory files? Use them directly:
inventory "hosts.ini"
each name, addr in ${webservers}:
target "${name}" ssh ${addr}:
[deploy to ${name}]:
run $ /opt/deploy.sh ${version}
The release artifact is a single cgr.py. For maintainers, regenerate after changing ide.html or visualize_template.py:
python3 build_cgr.py
python3 build_cgr.py --check| Command | What it does |
|---|---|
cgr plan FILE |
Show execution order and parallel waves |
cgr apply FILE |
Execute the graph (--dry-run, --parallel N, --tags, --run-id, --state) |
cgr validate FILE |
Check syntax and dependencies |
cgr check FILE |
Run checks to detect drift |
cgr visualize FILE |
Generate interactive HTML DAG visualization |
cgr serve FILE |
Web IDE with live graph and execution |
cgr explain FILE STEP |
Show the dependency chain for a step |
cgr why FILE STEP |
Show what depends on a step |
cgr state show FILE |
Show done/failed/pending state |
cgr state test FILE |
Re-run checks, detect drift |
cgr state reset FILE |
Wipe state, start fresh |
cgr diff FILE FILE2 |
Structural graph comparison |
cgr ping FILE |
Verify SSH connectivity to all targets |
cgr report FILE |
View collected outputs (table, JSON, CSV) |
cgr lint FILE |
Best-practice linter |
cgr fmt FILE |
Auto-formatter |
cgr convert FILE |
Convert between .cg and .cgr formats |
cgr secrets CMD FILE |
Manage encrypted secrets |
cgr init |
Scaffold a new .cgr file |
cgr doctor |
Check environment for common issues |
# State says config is deployed. Someone deleted it on the server.
cgr state test deploy.cgr
write_config: DRIFTED -- check now fails (was: success)
# Fix it:
cgr apply deploy.cgr # only the drifted step re-runspython3 -m py_compile cgr.py # syntax check
python3 -m pytest test_commandgraph.py -q # test suite
cd testing/ && ./run-demos.sh # 10 local container demos
cd testing-ssh/ && ./run-ssh-demos.sh # 5 SSH demos| Document | For whom | What's in it |
|---|---|---|
| QUICKSTART.md | New users | Zero to running in 5 minutes |
| TUTORIAL.md | Beginners | 9 guided lessons, ~1 hour |
| COOKBOOK.md | Operators | 10 real-world recipes |
| MANUAL.md | Reference | Complete syntax for .cgr and .cg |
| COMMANDGRAPH_SPEC.md | Code generators | Formal PEG grammar |
| AGENTS.md | Contributors | Architecture and internals |
Files are the interface. A .cgr file is a complete, portable, version-controllable description of your infrastructure. No web UI required, no database, no daemon.
Idempotent by default. Every step has a skip if check. Run it 10 times, get the same result.
Crash-safe. State is append-only with fsync after each write. A power failure loses at most one line.
Zero dependencies. One Python file, stdlib only. Copy it to an air-gapped server and it works.
Human-readable. The syntax reads like English: "First install nginx. Skip if already installed. Run apt-get install." No YAML indentation wars. No JSON escaping. No Jinja2 templating bugs.
Graphs, not lists. You declare dependencies. The engine computes execution order and maximizes parallelism. Reorder your file however you want -- the result is the same.
