CommandGraph

Start describing what you want. Let CommandGraph chart the course.

CommandGraph is an infrastructure tool that runs shell commands in the right order, skips what's already done, and recovers from crashes. You write a plain-text file that reads like English (or use an agent on your behalf!). The engine builds a dependency graph, parallelizes what it can, and executes over SSH or locally.

One Python file with zero dependencies. No agents on your servers. No daemon. No database.

--- Deploy my app ---

target "web" ssh [email protected]:

  [install nginx] as root:
    skip if $ command -v nginx
    run    $ apt-get install -y nginx

  [write site config] as root:
    first [install nginx]
    content > /etc/nginx/sites-available/myapp:
      server {
          listen 80;
          server_name myapp.example.com;
          root /var/www/myapp;
          index index.html;
          location / { try_files $uri $uri/ =404; }
      }
    validate $ nginx -t

  [enable site] as root:
    first [write site config]
    skip if $ test -L /etc/nginx/sites-enabled/myapp
    run    $ ln -sf /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp

  [deploy code]:
    first [install nginx]
    skip if $ test -f /var/www/myapp/index.html
    run    $ mkdir -p /var/www/myapp && echo "<h1>Hello World</h1>" > /var/www/myapp/index.html

  [start nginx] as root:
    first [enable site], [deploy code]
    skip if $ systemctl is-active nginx
    run    $ systemctl reload-or-restart nginx

  verify "site is live":
    first [start nginx]
    run $ curl -sf http://localhost/
    retry 3x wait 2s

That's a complete, runnable deployment. [brackets] name your steps. first declares what must happen before. skip if makes it idempotent. content > writes config files with automatic validate and rollback. verify is your smoke test.

Web IDE

cgr serve FILE launches a browser-based IDE with a live DAG visualization and execution panel. The left pane is an editor; the right pane shows the dependency graph updating in real time as you edit. Run apply, stream step output, inspect state and history, and view collected report data -- all from the browser.

Point it at any .cgr file on your machine:

# opens http://localhost:8080 with live editing + graph
cgr serve mysetup.cgr

Installation

# Copy one file. That's it.
curl -O https://raw.githubusercontent.com/commandgraph/cgr/main/cgr.py
chmod +x cgr.py
sudo mv cgr.py /usr/local/bin/cgr

# Or just run it directly
python3 cgr.py apply mysetup.cgr

No pip install. No virtualenv. No dependencies. Python 3.9+ only.

Safe local demo for first-time users

If you want to see CommandGraph work before touching a server, use the container demo suite in testing/. It runs entirely locally, uses local files and disposable stub services instead of real infrastructure, and gives a new user a fast way to watch the engine plan, execute, fail, resume, and detect drift.

# 1. Clone the repo and enter it
git clone <repository-url>
cd commandgraph

# 2. Run the local demos
cd testing
./run-demos.sh list    # see the 10 demos
./run-demos.sh 1       # quick first demo: plan -> apply -> idempotent re-run
./run-demos.sh 3       # crash recovery and resume
./run-demos.sh         # run the full suite

What this gives you:

No SSH targets, cloud accounts, or real services required
A disposable container image with cgr, example graphs, and the template repo preloaded
Narrated demos for validation, execution, templates, crash recovery, parallelism, race, drift detection, HTTP/reporting, CLI tooling, and state isolation

If you want to explore interactively:

cd testing
./run-demos.sh shell

Inside that shell, cgr is already on PATH, examples live in /opt/cgr/examples, and the repo is at /opt/cgr/repo.

Why not Ansible / Puppet / Chef / Salt?

Those tools are good and widely used. CommandGraph is not intended to replace or replicate them. It sits above, bridging things tightly together.

How does CommandGraph compare to, say Ansible?

Scenario	Ansible	CommandGraph
Set up a new server	Write a playbook, install Ansible, configure inventory, install collections	Write a `.cgr` file, copy one Python file, run it
See what will happen	`--check` (unreliable for shell/command)	`cgr plan` shows exact execution waves
Run steps in parallel	Set `forks` globally, `serial` per play, hope for the best	Automatic. Independent steps run concurrently. Or use `parallel`, `each`, `race` for explicit control
Resume after failure	Re-run entire playbook, skip with tags or `--start-at-task`	`cgr apply` resumes from exactly where it stopped
Detect drift	Write a separate check playbook	`cgr state test` re-runs checks, reports what changed
Deploy to air-gapped server	Install Ansible + deps on control node	`scp cgr.py` and go
Understand the dependency graph	Read the YAML top to bottom, hope the ordering is right	`cgr visualize` generates an interactive HTML DAG

The deeper difference: Ansible executes tasks in the order you wrote them. CommandGraph builds a dependency graph and figures out the order for you. You declare what depends on what and the engine maximizes parallelism automatically.

Running Ansible playbooks from CommandGraph

Already have Ansible playbooks? Run them as steps inside a CommandGraph. This lets you sequence playbooks alongside shell commands, API calls, and other tools -- with crash recovery, dependency ordering, and parallel execution that Ansible alone can't express:

--- Provision and configure with Ansible ---

set env = "staging"

target "control" local:

  [provision infra]:
    run    $ terraform apply -auto-approve -var="env=${env}"
    timeout 10m

  [run base playbook]:
    first [provision infra]
    skip if $ ansible -i inventory/${env} all -m ping | grep -q SUCCESS
    run    $ ansible-playbook -i inventory/${env} playbooks/base.yml
    timeout 15m, retry 1x wait 30s

  [run app playbook]:
    first [run base playbook]
    run    $ ansible-playbook -i inventory/${env} playbooks/app.yml --tags deploy
    timeout 10m

  [smoke test]:
    first [run app playbook]
    get "https://${env}.example.com/health"
    expect 200
    retry 5x wait 10s

  verify "fleet is healthy":
    first [smoke test]
    run $ ansible -i inventory/${env} all -m shell -a 'systemctl is-active myapp'

Terraform provisions, Ansible configures, CommandGraph orchestrates -- with resume from any failure point. You can also use Ansible inventory files directly with inventory "hosts.ini" (see Ansible inventory compatibility).

What the engine does for you

The engine reads your file, builds a dependency graph, and groups independent steps into parallel waves:

Steps with no dependency between them run in the same wave simultaneously. Steps that depend on others wait for their prerequisites. You didn't have to think about this -- the engine figured it out from your first declarations.

Crash recovery that actually works

Every completed step is written to a .state file atomically. Crash mid-run, fix the problem, run again. Completed steps skip from state without even SSHing to the server.

Need isolated journals for concurrent parameterized runs? Use cgr apply FILE --run-id canary to salt the default state path, or cgr apply FILE --state /path/to/run.state to pin an explicit journal.

When a step fails, the engine automatically shows the command and stderr. No re-running with -v to figure out what went wrong.

Built-in reporting

CommandGraph ships with two reporting layers. You can collect stdout from specific steps for audit-style output, and you can also ask apply to write a machine-readable run summary for CI or archival.

Mark any step with collect "key" and its stdout is saved after execution:

--- Audit a host ---

target "web-1" ssh [email protected]:

  [hostname]:
    run $ hostname
    collect "hostname"

  [kernel]:
    run $ uname -r
    collect "kernel"

  [disk]:
    run $ df -h /
    collect "disk_usage"

After cgr apply audit.cgr, view or export the collected data:

cgr report audit.cgr
cgr report audit.cgr --format json
cgr report audit.cgr --format csv -o audit.csv
cgr report audit.cgr --keys hostname,kernel

For multi-node graphs, cgr report turns collected keys into columns, which makes fleet audits and inventory snapshots easy to export.

If you want a run-level execution summary instead, cgr apply FILE --report run.json writes JSON containing wall-clock timing, per-step statuses, provenance, dedup information, and any collected outputs.

SSH execution

Point a target at an SSH host and every command runs remotely. State stays on your machine. No agent or runtime needed on the server -- just SSH access.

Multiple targets in one file run in parallel. Steps with as root are automatically wrapped in sudo on the remote side.

Parallel constructs

Four constructs for explicit concurrency, all composable with everything else:

`parallel` -- fork/join with bounded concurrency

[build everything]:
  parallel 2 at a time:
    [compile frontend]: run $ npm run build
    [compile backend]:  run $ cargo build --release
    [build docs]:       run $ mkdocs build

`race` -- first to succeed wins, rest cancelled

[download package]:
  race into pkg.tar.gz:
    [us mirror]:  run $ curl -sf https://us.example.com/pkg.tar.gz -o ${_race_out}
    [eu mirror]:  run $ curl -sf https://eu.example.com/pkg.tar.gz -o ${_race_out}

Each branch writes to its own temp file. The winner is atomically renamed. No clobbering.

`each` -- parallel iteration over a list

set servers = "web-1,web-2,web-3,web-4"

[deploy to fleet]:
  each server in ${servers}, 3 at a time:
    [deploy to ${server}]:
      run $ ssh ${server} '/opt/activate.sh'

`stage`/`phase` -- canary rollouts with verification gates

[rolling deploy]:
  stage "production":
    phase "canary" 1 from ${servers}:
      [deploy ${server}]: run $ activate.sh
      verify "healthy": run $ curl -sf http://${server}/health
        retry 10x wait 3s

    phase "rest" remaining from ${servers}:
      each server, 4 at a time:
        [deploy ${server}]: run $ activate.sh

The canary deploys to 1 server. Its verify must pass before the rest begin. If unhealthy, the rollout stops.

File management without shell gymnastics

Write configs, edit lines, manage INI/JSON files -- all with built-in validation:

[write nginx config]:
  content > /etc/nginx/sites-available/myapp:
    server {
        listen 80;
        server_name example.com;
    }
  validate $ nginx -t

[harden sshd]:
  line "PermitRootLogin no" in /etc/ssh/sshd_config, replacing "^#?PermitRootLogin"
  line "PasswordAuthentication no" in /etc/ssh/sshd_config, replacing "^#?PasswordAuthentication"
  validate $ sshd -t

[tune postgres]:
  ini /etc/postgresql/14/main/postgresql.conf:
    shared_buffers = "256MB"
    max_connections = "200"

Writes are atomic. validate runs after the write; if it fails, the file is reverted. Inline content > and block in bodies preserve literal # characters, so config comments stay intact.

HTTP as a first-class operation

Call APIs directly -- no curl piping, no shell escaping:

[register host]:
  post "${api_host}/hosts"
  auth bearer "${api_token}"
  body json '{"hostname": "web-1", "status": "active"}'
  expect 200..299
  collect "registration"

Supports get, post, put, patch, delete. Auth tokens are automatically redacted from output. On SSH targets, requests execute via curl.

Reusable templates

44 standard templates across 21 categories -- packages, containers, TLS, firewalls, databases, monitoring, backups, and more. Here's a production-grade nginx + certbot deployment that uses five of them:

--- Full-stack Nginx + TLS deployment ---

using apt/install_package, firewall/allow_port, systemd/enable_service, tls/certbot, nginx/vhost

set domain   = "app.example.com"
set ssh_user = "deploy"
set ssh_host = "10.0.1.5"

target "web-1" ssh ${ssh_user}@${ssh_host}:

  [install web packages] from apt/install_package:
    name = "nginx curl"

  [open http] from firewall/allow_port:
    port = "80"

  [open https] from firewall/allow_port:
    port = "443"

  [get tls cert] from tls/certbot:
    domain = "${domain}"
    email  = "[email protected]"

  [configure vhost] from nginx/vhost:
    domain   = "${domain}"
    port     = "443"
    doc_root = "/var/www/${domain}"

  [deploy app files] as root:
    first [install web packages], [configure vhost]
    skip if $ test -f /var/www/${domain}/index.html
    run    $ echo '<h1>${domain} is live</h1>' > /var/www/${domain}/index.html

  [start nginx] as root, if fails stop:
    first [deploy app files], [get tls cert], [open https], [open http]

    [write ssl params] as root:
      skip if $ test -f /etc/nginx/snippets/ssl-params.conf
      content > /etc/nginx/snippets/ssl-params.conf:
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_prefer_server_ciphers on;
        ssl_ciphers HIGH:!aNULL:!MD5;
      validate $ nginx -t

    first [write ssl params]
    skip if $ systemctl is-active nginx | grep -q active
    run    $ systemctl reload-or-restart nginx

  [enable on boot] from systemd/enable_service:
    service = "nginx"

  verify "HTTPS 200 on ${domain}":
    first [start nginx], [enable on boot]
    run   $ curl -sfk -o /dev/null -w '%{http_code}' https://${domain}/ | grep -q 200
    retry 3x wait 2s

Templates are .cgr files in the repo/ directory. Each one declares its parameters, version, and description. Write your own by dropping a file in the right category. No Galaxy. No collections. Just files.

Categories include: apt, dnf, nginx, tls, firewall, systemd, service, docker, k8s, user, ssh, security, file, backup, db, monitoring, webhook, cron, and pkg.

Cross-distro, conditionals, and runtime detection

[install packages (apt)]:
  when os_family == "debian"
  run $ apt-get install -y nginx

[install packages (yum)]:
  when os_family == "redhat"
  run $ yum install -y nginx

[detect pigz]:
  run $ command -v pigz
  on success: set compressor = "pigz"
  on failure: set compressor = "gzip"
  if fails ignore

[compress]:
  first [detect pigz]
  run $ ${compressor} archive.tar

Override anything at runtime: cgr apply --set os_family=redhat --set version=2.5.0

Encrypted secrets

cgr secrets create vault.enc        # create encrypted vault
cgr secrets edit vault.enc          # edit in $EDITOR

secrets "vault.enc"

target "db" ssh [email protected]:
  [configure db]:
    run $ echo "${db_password}" | psql -c "ALTER USER app PASSWORD '$(cat)'"

Secrets are decrypted at runtime, never written to disk, and auto-redacted from all output.

Ansible inventory compatibility

Already have inventory files? Use them directly:

inventory "hosts.ini"

each name, addr in ${webservers}:
  target "${name}" ssh ${addr}:
    [deploy to ${name}]:
      run $ /opt/deploy.sh ${version}

Maintainer note

The release artifact is a single cgr.py. For maintainers, regenerate after changing ide.html or visualize_template.py:

python3 build_cgr.py
python3 build_cgr.py --check

CLI reference

Command	What it does
`cgr plan FILE`	Show execution order and parallel waves
`cgr apply FILE`	Execute the graph (`--dry-run`, `--parallel N`, `--tags`, `--run-id`, `--state`)
`cgr validate FILE`	Check syntax and dependencies
`cgr check FILE`	Run checks to detect drift
`cgr visualize FILE`	Generate interactive HTML DAG visualization
`cgr serve FILE`	Web IDE with live graph and execution
`cgr explain FILE STEP`	Show the dependency chain for a step
`cgr why FILE STEP`	Show what depends on a step
`cgr state show FILE`	Show done/failed/pending state
`cgr state test FILE`	Re-run checks, detect drift
`cgr state reset FILE`	Wipe state, start fresh
`cgr diff FILE FILE2`	Structural graph comparison
`cgr ping FILE`	Verify SSH connectivity to all targets
`cgr report FILE`	View collected outputs (table, JSON, CSV)
`cgr lint FILE`	Best-practice linter
`cgr fmt FILE`	Auto-formatter
`cgr convert FILE`	Convert between `.cg` and `.cgr` formats
`cgr secrets CMD FILE`	Manage encrypted secrets
`cgr init`	Scaffold a new `.cgr` file
`cgr doctor`	Check environment for common issues

Drift detection

# State says config is deployed. Someone deleted it on the server.
cgr state test deploy.cgr

  write_config: DRIFTED -- check now fails (was: success)

# Fix it:
cgr apply deploy.cgr   # only the drifted step re-runs

Testing

python3 -m py_compile cgr.py                   # syntax check
python3 -m pytest test_commandgraph.py -q      # test suite
cd testing/ && ./run-demos.sh                  # 10 local container demos
cd testing-ssh/ && ./run-ssh-demos.sh          # 5 SSH demos

Documentation

Document	For whom	What's in it
QUICKSTART.md	New users	Zero to running in 5 minutes
TUTORIAL.md	Beginners	9 guided lessons, ~1 hour
COOKBOOK.md	Operators	10 real-world recipes
MANUAL.md	Reference	Complete syntax for `.cgr` and `.cg`
COMMANDGRAPH_SPEC.md	Code generators	Formal PEG grammar
AGENTS.md	Contributors	Architecture and internals

Design principles

Files are the interface. A .cgr file is a complete, portable, version-controllable description of your infrastructure. No web UI required, no database, no daemon.

Idempotent by default. Every step has a skip if check. Run it 10 times, get the same result.

Crash-safe. State is append-only with fsync after each write. A power failure loses at most one line.

Zero dependencies. One Python file, stdlib only. Copy it to an air-gapped server and it works.

Human-readable. The syntax reads like English: "First install nginx. Skip if already installed. Run apt-get install." No YAML indentation wars. No JSON escaping. No Jinja2 templating bugs.

Graphs, not lists. You declare dependencies. The engine computes execution order and maximizes parallelism. Reorder your file however you want -- the result is the same.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmarks		benchmarks
docs		docs
examples		examples
repo		repo
testing-ssh		testing-ssh
testing		testing
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
COMMANDGRAPH_SPEC.md		COMMANDGRAPH_SPEC.md
COOKBOOK.md		COOKBOOK.md
MANUAL.md		MANUAL.md
MIGRATION_GUIDE.md		MIGRATION_GUIDE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
TUTORIAL.md		TUTORIAL.md
api_integration.cgr		api_integration.cgr
authorize_ssh_key.cgr		authorize_ssh_key.cgr
build_cgr.py		build_cgr.py
cgr.py		cgr.py
fleet.ini		fleet.ini
gen_cheatsheet.py		gen_cheatsheet.py
hello.cgr		hello.cgr
ide.html		ide.html
install_claude_code.cgr		install_claude_code.cgr
install_codex.cgr		install_codex.cgr
iso_distribute.cgr		iso_distribute.cgr
k3s_worker_upgrade_cycle.cgr		k3s_worker_upgrade_cycle.cgr
multinode_test.cg		multinode_test.cg
multinode_test.cgr		multinode_test.cgr
nginx_setup.cg		nginx_setup.cg
nginx_setup.cgr		nginx_setup.cgr
parallel_test.cg		parallel_test.cg
parallel_test.cgr		parallel_test.cgr
revoke_ssh_key.cgr		revoke_ssh_key.cgr
run_python_tests.cgr		run_python_tests.cgr
staging.ini		staging.ini
system_audit.cgr		system_audit.cgr
test_commandgraph.py		test_commandgraph.py
visualize_template.py		visualize_template.py
webserver.cg		webserver.cg
webserver.cgr		webserver.cgr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CommandGraph

Web IDE

Installation

Safe local demo for first-time users

Why not Ansible / Puppet / Chef / Salt?

Running Ansible playbooks from CommandGraph

What the engine does for you

Crash recovery that actually works

Built-in reporting

SSH execution

Parallel constructs

`parallel` -- fork/join with bounded concurrency

`race` -- first to succeed wins, rest cancelled

`each` -- parallel iteration over a list

`stage`/`phase` -- canary rollouts with verification gates

File management without shell gymnastics

HTTP as a first-class operation

Reusable templates

Cross-distro, conditionals, and runtime detection

Encrypted secrets

Ansible inventory compatibility

Maintainer note

CLI reference

Drift detection

Testing

Documentation

Design principles

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CommandGraph

Web IDE

Installation

Safe local demo for first-time users

Why not Ansible / Puppet / Chef / Salt?

Running Ansible playbooks from CommandGraph

What the engine does for you

Crash recovery that actually works

Built-in reporting

SSH execution

Parallel constructs

parallel -- fork/join with bounded concurrency

race -- first to succeed wins, rest cancelled

each -- parallel iteration over a list

stage/phase -- canary rollouts with verification gates

File management without shell gymnastics

HTTP as a first-class operation

Reusable templates

Cross-distro, conditionals, and runtime detection

Encrypted secrets

Ansible inventory compatibility

Maintainer note

CLI reference

Drift detection

Testing

Documentation

Design principles

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`parallel` -- fork/join with bounded concurrency

`race` -- first to succeed wins, rest cancelled

`each` -- parallel iteration over a list

`stage`/`phase` -- canary rollouts with verification gates

Packages