A Python service that monitors Docker containers in real time and automatically restarts them based on customizable rules, including any dependent containers. Ideal for environments where high availability matters and zombie containers are not welcome at the party.
-
Monitors Docker events in real-time.
-
Automatically restarts containers that are unhealthy or have unexpectedly exited.
-
Supports a restart policy configurable via environment variables.
-
Handles container dependencies using labels (com.monitor.depends.on).
-
Detailed, timezone-aware logging.
-
Supports container exclusion from restart policies.
-
Supports real-time notifications through Apprise
The service listens to Docker daemon events. When it detects that a container is in an unhealthy state or has exited with a non-excluded code, it restarts it. If the container has dependencies (defined through labels), it restarts those too, in the correct order, using topological sorting.
Example: [db] --> [backend] --> [frontend]
If db goes down, the service will restart db, then backend, and finally frontend.
Configuration is handled through a .env file in the project root.
Here’s an example:
# Restart policy in JSON format
RESTART_POLICY = '{
"excludedContainers": ["container_name"], #-> More than 1 container could be excluded. Specify them as ["container1", "container2"]
"statuses": {
"exited": {
"codesToExclude": [0] #-> More than 1 exit code could be excluded. Specify them as ["code1", "code2", "code3"]
}
}
}'
ENABLE_DASHBOARD=True #-> Possible values [True | False]
LOGS_AMOUNT=10 #-> This will display the last n logs on the dashboard to clearly indicate the issue that triggered the restart policy
DASHBOARD_ADDRESS=0.0.0.0 #-> Possible values [0.0.0.0 | 127.0.0.1]
DASHBOARD_PORT=8000 #-> Possible values [ Any free port ]
ADMIN_PASSWORD=
ENABLE_NOTIFICATIONS=True #-> Possible values [True | False]
NOTIFICATION_URLS='["url1", "url2"]' #-> Check https://github.com/caronc/apprise/wiki#notification-services
NOTIFICATION_TITLE="" #-> Edit the notification title as you wish
NOTIFICATION_BODY="" #-> Edit the notification body as you wish
###############
# LOGGING #
###############
# --- Log Level ---
# Set the verbosity of logs. Options: "error", "warn", "info", "debug"
# Default: info
LOG_LEVEL= info
# --- Log Timezone ---
# Adjust the timezone used for logging
# e.g. Europe/Rome, America/New_York
LOG_TIMEZONE=UTC
Defines which containers to ignore and which states should trigger a restart.
excludedContainers: list of containers that should never be restarted.statuses:exited→ restart if the container exited with a non-excluded code.codesToExclude: -> A list of codes that should not trigger a restart. Check codes here
Controls log verbosity.
Supported values: error, warn, info, debug.
Default: info.
Sets the timezone used in logs.
Must be a valid pytz timezone.
Examples: UTC, Europe/Rome, America/New_York.
Default: UTC
Check the valid timezones here
Enables or disables the web dashboard.
Default: False
Number of log entries to retain when a container is restarted.
Default: 10
Address interface for the dashboard:
127.0.0.1-> Local only0.0.0.0-> accessible on LAN
Default: 0.0.0.0
Port on which the dashboard is served.
Default: 8000
Password for accessing the dashboard. Support for three formats:
- Plain text
- ADMIN_PASSWORD=r4nd0mP4ssW0rD
- Bcrypt
- ADMIN_PASSWORD=$2a$12$9s8F...
- Argon2
- ADMIN_PASSWORD=$argon2id$v=19$m=65536,t=3,p=4$...
The system automatically detects whether the value is plain text, bcrypt, or Argon2.
If you want a strong random password (plain text), you can generate one using: openssl rand -hex 32 This is a plain password, not an encrypted hash
Enables or disables real-time notifications.
Supported values: True | False
Default: False
See the notification's section for more details
A JSON-formatted list of notification endpoints, as documented in the Apprise URL specification
Expected Syntax: '["url1", "url2"]'
The title template for notifications.
Supports placeholders and emoji.
Default: '⚠️ {container_name} crashed'
Supported placeholders:
- {container_name}
- {logs}
- {exit_code}
- {n_logs}
The body template for notifications.
Supports placeholders, multiline text (\n), and Markdown formatting.
Does not support icons/emoji (depending on the provider).
Default: '`exit code`: `{exit_code}`\nLast {n_logs} logs of `{container_name}`: {logs}'
Supported placeholders:
- {container_name}
- {logs}
- {exit_code}
- {n_logs}
- User submits their password to /auth/login
- The server validates it in this order:
- argon2 verification
- bcrypt
checkpw - direct comparison (plain text)
- If valid, a JWT token is created and stored in a HttpOnly Cookie
- Protected routes require thise cookie to be present and valid
You can define container dependencies using the label com.monitor.depends.on.
When a parent container is restarted, its dependent containers will be restarted too, in the correct order.
Example docker-compose.yml:
services:
db:
image: postgres
container_name: db
backend:
image: my-backend
container_name: backend
labels:
- "com.monitor.depends.on=db"
frontend:
image: my-frontend
container_name: frontend
labels:
- "com.monitor.depends.on=backend"
docker-surgeon:
image: docker-surgeon-image
container_name: docker-surgeon
volumes:
- /var/run/docker.sock:/var/run/docker.sock
env_file:
- path/to/.env
In this setup:
If db crashes → db, backend, and frontend will be restarted in order.
If backend crashes → backend and frontend will be restarted.
If frontend crashes → only frontend will be restarted.
Multiple dependents can be specified for a container by separating them with a comma: com.monitor.depends.on=backend,frontend,db
docker run -d \
--name docker-surgeon \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /your/path/data:/app/app/data \ # persistent data (recommended if dashboard is enabled)
-v $(pwd)/.env:/app/.env \
krystall0/docker-surgeon:latest
You can also override environment variables directly:
docker run -d \
--name docker-surgeon \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /your/path/data:/app/app/data \ # persistent data (recommended if dashboard is enabled)
-e LOG_LEVEL=INFO \
-e LOG_TIMEZONE=Europe/Rome \
-e RESTART_POLICY='{"excludedContainers":["pihole"],"statuses":{"exited":{"codesToExclude":[0]}}}' \
krystall0/docker-surgeon:latest
version: "3.8"
services:
docker-surgeon:
image: krystall0/docker-surgeon:latest
container_name: docker-surgeon
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /your/path/data:/app/app/data # persistent data (recommended if dashboard is enabled)
env_file:
- /path/to/.env
db:
image: postgres
container_name: db
backend:
image: my-backend
container_name: backend
labels:
- "com.monitor.depends.on=db"
frontend:
image: my-frontend
container_name: frontend
labels:
- "com.monitor.depends.on=backend"
Docker Surgeon includes a built-in web dashboard that helps you inspect:
- Recent container crashes
- Logs grouped by container
- Crash statistics over time
- Interactive charts
- Date-based filtering
- Full log viewer with multiline formatting
To access the dashboard:
http://<your-ip>:<your-port>
(Requires authentication — see Authentication Flow)
Docker Surgeon can send real-time notifications whenever a container crashes. Notifications are handled through Apprise, supporting 70+ services including:
- Discord
- Telegram
- Slack
- Matrix
- Webhooks
- Gotify / Pushover / Pushbullet
And many others…
See Apprise for more details
Add these variables to your .env:
ENABLE_NOTIFICATIONS=True
NOTIFICATION_URLS=["discord://<webhook_id>/<webhook_token>"]
NOTIFICATION_TITLE="⚠️ {container_name} crashed"
NOTIFICATION_BODY="`exit code`: `{exit_code}`\nLast {n_logs} logs:\n{logs}"
Docker Surgeon supports placeholder variables inside NOTIFICATION_TITLE and NOTIFICATION_BODY.
Available placeholders:
{container_name}→ name of the crashed container{exit_code}→ container exit code{logs}→ last N logs (ANSI colors removed){n_logs}→ number of logs configured inLOGS_AMOUNT
Example notification body:
exit code: {exit_code}
Container {container_name} crashed.
Last {n_logs} logs:
{logs}
- Do not expose the dashboard over the internet without HTTPS and reverse proxy protections
- Always use a strong admin password (preferably hashed)
