Operations

Production deployment and monitoring.

Running as a Service

systemd

Create /etc/systemd/system/dagu.service:

ini

[Unit]
Description=Dagu Workflow Engine
Documentation=https://docs.dagu.sh/
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=dagu
Group=dagu
WorkingDirectory=/opt/dagu

# Main process
ExecStart=/usr/local/bin/dagu start-all

# Graceful shutdown
ExecStop=/bin/kill -TERM $MAINPID
TimeoutStopSec=30
KillMode=mixed
KillSignal=SIGTERM

# Restart policy
Restart=always
RestartSec=10
StartLimitInterval=60
StartLimitBurst=3

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs

# Resource limits
LimitNOFILE=65536
LimitNPROC=4096

# Environment
EnvironmentFile=-/etc/dagu/environment
Environment="DAGU_HOME=/opt/dagu"

[Install]
WantedBy=multi-user.target

Create /etc/dagu/environment:

bash

DAGU_HOST=0.0.0.0
DAGU_PORT=8080
DAGU_TZ=America/New_York
DAGU_LOG_FORMAT=json

Setup:

bash

# Create user and directories
sudo useradd -r -s /bin/false dagu
sudo mkdir -p /opt/dagu/{dags,data,logs}
sudo chown -R dagu:dagu /opt/dagu

# Enable and start
sudo systemctl enable dagu
sudo systemctl start dagu

# Check status
sudo systemctl status dagu
sudo journalctl -u dagu -f

Docker Compose

compose.yml:

yaml

version: '3.8'

services:
  dagu:
    image: ghcr.io/dagucloud/dagu:latest
    container_name: dagu
    restart: unless-stopped
    
    # Health check
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/v1/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    
    # Port mapping
    ports:
      - "8080:8080"
    
    # Environment variables
    environment:
      # Server configuration
      - DAGU_PORT=8080
      - DAGU_HOST=0.0.0.0
      - DAGU_TZ=America/New_York
      
      # Logging
      - DAGU_LOG_FORMAT=json
      
      # Authentication (optional)
      # - DAGU_AUTH_BASIC_USERNAME=admin
      # - DAGU_AUTH_BASIC_PASSWORD=your-secure-password
      
      # User/Group IDs (optional)
      # - PUID=1000
      # - PGID=1000
      
      # Docker-in-Docker support (optional)
      # - DOCKER_GID=999
    
    # Volume mounts
    volumes:
      - dagu:/var/lib/dagu
      
      # Docker socket for Docker executor (optional)
      # - /var/run/docker.sock:/var/run/docker.sock
    
    # Logging configuration
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"

volumes:
  dagu-data:
  dagu-logs:

bash

# Start
docker compose up -d

# Logs
docker compose logs -f

# Stop
docker compose down

With authentication (.env file):

bash

DAGU_AUTH_BASIC_USERNAME=admin
DAGU_AUTH_BASIC_PASSWORD=secure-password

Resource Monitoring

Dagu provides built-in resource monitoring that tracks CPU, memory, disk, and load average. The data is displayed in the System Status page of the web UI.

Configuration:

yaml

# config.yaml
monitoring:
  retention: "24h"    # How long to keep history (default: 24h)
  interval: "5s"      # Collection frequency (default: 5s)

bash

# Or via environment variables
export DAGU_MONITORING_RETENTION=12h
export DAGU_MONITORING_INTERVAL=10s

Metrics collected:

CPU Usage - Overall CPU utilization percentage
Memory Usage - RAM utilization percentage
Disk Usage - Disk space utilization for the data directory
Load Average - 1-minute system load average

API Endpoint:

bash

# Get resource history (last hour by default)
curl http://localhost:8080/api/v1/services/resources/history

# Get last 30 minutes
curl http://localhost:8080/api/v1/services/resources/history?duration=30m

Memory Usage: Resource history is stored in memory. With default settings (5s interval, 24h retention), memory usage is approximately 1.1MB for all 4 metrics.

Prometheus Metrics

Metrics available at /api/v1/metrics:

System:

dagu_info - Build information
dagu_uptime_seconds - Uptime
dagu_scheduler_running - Scheduler status

DAGs:

dagu_dags_total - Total DAGs
dagu_dag_runs_currently_running - Running DAGs
dagu_dag_runs_queued_total - Queued DAGs
dagu_dag_runs_total - DAG runs by status (24h)

Standard:

Go runtime metrics
Process metrics

Logging

yaml

# config.yaml
log_format: json    # text or json
debug: true       # Debug mode
paths:
  log_dir: /var/log/dagu

bash

# Or via environment
export DAGU_LOG_FORMAT=json
export DAGU_DEBUG=true
export DAGU_LOG_DIR=/var/log/dagu

JSON log example:

json

{
  "time": "2024-03-15T12:00:00Z",
  "level": "INFO",
  "msg": "DAG execution started",
  "dag": "data-pipeline",
  "run_id": "20240315_120000_abc123"
}

Log Cleanup

Automatic cleanup based on hist_retention_days:

yaml

# Per-DAG
hist_retention_days: 7  # Keep 7 days

# Or global in base.yaml
hist_retention_days: 30  # Default

Special values:

0 - Delete after each run
-1 - Keep forever

Deletes:

Execution logs
Step output (.out, .err)
Status files (.jsonl)
Sub DAG logs

Viewing Run History

Before cleaning up logs, review execution history with dagu history:

bash

# Preview what cleanup would affect
dagu history my-workflow --limit 100

# Check run status before deletion
dagu history my-workflow --from 2025-01-01 --to 2025-12-31

The history command helps:

Identify which runs to keep/delete
Verify cleanup results
Export run metadata before cleanup: dagu history --format json

See history CLI reference and cleanup command.

Alerting

Email

yaml

# base.yaml
smtp:
  host: "smtp.gmail.com"
  port: "587"
  username: "[email protected]"
  password: "${SMTP_PASSWORD}"

error_mail:
  from: "[email protected]"
  to: "[email protected]"
  prefix: "[ERROR]"
  attach_logs: true

mail_on:
  failure: true
  success: false

Per-step notification:

yaml

steps:
  - id: critical_task
    command: echo "Processing"
    mail_on_error: true

Webhooks

Slack:

yaml

handler_on:
  failure:
    type: http
    with:
      url: "${SLACK_WEBHOOK_URL}"
      method: POST
      body: |
        {
          "text": "Workflow Failed: ${DAG_NAME}",
          "blocks": [{
            "type": "section",
            "text": {
              "type": "mrkdwn",
              "text": "*Run ID:* ${DAG_RUN_ID}"
            }
          }]
        }

PagerDuty:

yaml

handler_on:
  failure:
    type: http
    with:
      url: https://events.pagerduty.com/v2/enqueue
      body: |
        {
          "routing_key": "${PAGERDUTY_KEY}",
          "event_action": "trigger",
          "payload": {
            "summary": "Failed: ${DAG_NAME}",
            "severity": "error"
          }
        }

Security

Environment Variable Filtering

Dagu filters the process environment before it builds the step execution environment and before it starts sub-DAG executions.

System environment variables are still available for ${VAR} expansion when Dagu parses the DAG.

Built-in forwarded variables:

Unix and macOS exact names: PATH, HOME, USER, SHELL, TMPDIR, TERM, EDITOR, VISUAL, LANG, LC_ALL, LC_CTYPE, TZ, LD_LIBRARY_PATH, XDG_CONFIG_HOME, XDG_DATA_HOME, XDG_CACHE_HOME, DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH, DOCKER_API_VERSION
Windows exact names: USERPROFILE, SYSTEMROOT, WINDIR, SYSTEMDRIVE, COMSPEC, PATHEXT, TEMP, TMP, PATH, PSMODULEPATH, HOME, DOCKER_HOST, DOCKER_TLS_VERIFY, DOCKER_CERT_PATH, DOCKER_API_VERSION
Prefixes on all platforms: DAGU_, DAG_, LC_, KUBERNETES_

Dagu also sets runtime variables with the DAG_ prefix for each step execution:

DAG_NAME, DAG_RUN_ID, DAG_RUN_STEP_NAME
DAG_RUN_LOG_FILE, DAG_RUN_STEP_STDOUT_FILE, DAG_RUN_STEP_STDERR_FILE

You can extend the forwarded set in Dagu configuration:

yaml

env_passthrough:
  - SSL_CERT_FILE
  - HTTP_PROXY
  - HTTPS_PROXY
  - NO_PROXY

env_passthrough_prefixes:
  - AWS_

Or with environment variables:

bash

export DAGU_ENV_PASSTHROUGH=SSL_CERT_FILE,HTTP_PROXY,HTTPS_PROXY,NO_PROXY
export DAGU_ENV_PASSTHROUGH_PREFIXES=AWS_

These settings only forward matching variables that already exist in the Dagu process environment. On Unix, matching is case-sensitive. On Windows, matching is case-insensitive.

If a variable is not forwarded automatically, you can still make it available to the step by defining it explicitly in the workflow.

Use .env or env: when you want non-sensitive step environment contents to be explicit. For credentials and other secrets, prefer the secrets: block so values are resolved at runtime and masked in logs:

yaml

# workflow.yaml
dotenv: .env.runtime  # Load non-sensitive defaults from a file

secrets:
  - name: AWS_ACCESS_KEY_ID
    provider: env
    key: PROD_AWS_ACCESS_KEY_ID
  - name: AWS_SECRET_ACCESS_KEY
    provider: env
    key: PROD_AWS_SECRET_ACCESS_KEY
  - name: DATABASE_PASSWORD
    provider: file
    key: /run/secrets/db-password

steps:
  - id: deploy
    command: aws s3 sync ./build s3://my-bucket

bash

# .env.runtime (safe to keep local; do not store secrets here)
AWS_REGION=ap-northeast-1
S3_BUCKET=my-bucket

If you need to copy a non-sensitive process variable into workflow env:, do it explicitly:

yaml

# workflow.yaml
env:
  - AWS_REGION: ${AWS_REGION}
  - S3_BUCKET: ${S3_BUCKET}

steps:
  - id: deploy
    command: aws s3 sync ./build s3://${S3_BUCKET}

See Secrets for provider details and masking behavior.

Process Isolation

bash

# systemd security features
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/dagu/data /opt/dagu/logs

Installation

Secrets

Workers

Operations

Running as a Service

systemd

Docker Compose

Resource Monitoring

Prometheus Metrics

Logging

Log Cleanup

Viewing Run History

Alerting

Email

Webhooks

Security

Environment Variable Filtering

Process Isolation

See Also

Operations ​

Running as a Service ​

systemd ​

Docker Compose ​

Resource Monitoring ​

Prometheus Metrics ​

Logging ​

Log Cleanup ​

Viewing Run History ​

Alerting ​

Email ​

Webhooks ​

Security ​

Environment Variable Filtering ​

Process Isolation ​

See Also ​

Operations

Running as a Service

systemd

Docker Compose

Resource Monitoring

Prometheus Metrics

Logging

Log Cleanup

Viewing Run History

Alerting

Email

Webhooks

Security

Environment Variable Filtering

Process Isolation

See Also