Server monitoring and data-collection daemon
Monitoring is an API with a DSL feel to write monitoring daemons in Python.
Monitoring works well for the following tasks:
- to be notified when incidents happen (email, XMPP, ZeroMQ...)
- automatic actions to be taken (restart, rm, git pull...)
- to collect system statistics for further processing e.g. graphs
- tie into existing/third-party Python code
- play along nicely with existing deployment/configuration ecosystem (fabric/cuisine)
- monitoring DSL: declarative programming to define monitoring strategy
- wide spectrum: from data collection and incident reporting to taking automatic actions
- Small, easy to read, a single file API
- Revised BSD License
- written in Python
python setup.py installor
easy_install monitoringCreate a monitoring script, for example my_monitor.py:
from monitoring import *
Monitor(
Service(
name="my-service",
monitor=(
HTTP(
GET="http://localhost:8080/health",
freq=Time.s(30),
fail=[Log("Service is down!")],
),
),
)
).run()Run it with:
python my_monitor.pyOr using the monitoring command:
monitoring my_monitor.pyIf you have the repository cloned locally and want to run scripts without keeping the entire codebase installed, download the main CLI on-the-fly:
curl -s https://raw.githubusercontent.com/sebastien/monitoring/main/src/sh/monitoring.sh | bash -s examples/system-health.pyThis downloads monitoring.py, pipes it to bash, and executes it with your local script, using the downloaded monitoring library.
See the examples/ directory for more usage examples:
system-health.py: Monitor system metrics like CPU, memory, disk usagehttp-latency.py: Monitor HTTP response timeshttp-ping-restart.py: Ensure HTTP services stay up by restarting on failureservice-tmux.py: Run services in tmux sessions (importsTmuxServicefrommonitoring)
Monitor: The main monitoring engine that runs servicesService: A collection of rules and actionsRule: Defines what to monitor (e.g., HTTP checks, system health)Action: Defines what to do on success/failure (e.g., log, email, restart)
HTTP: Check HTTP endpointsSystemHealth: Monitor CPU, memory, disk usageSystemInfo: Collect system statisticsBandwidth: Measure network bandwidthProcessInfo: Monitor process statisticsDelta: Track changes over time
Log: Log messages to files or stdoutEmail: Send email notificationsXMPP: Send XMPP messagesRun: Execute shell commandsRestart: Restart processesIncident: Trigger actions after multiple failuresTmuxRun: Execute commands in tmux windows
DaemonService: Base class for implementing services with start/stop/status directivesTmuxService: Manages long-running processes in tmux sessionsWebService: Manages web applications with tmux and HTTP health checks
Time: Time unit conversions (ms, s, m, h, d, w)Size: Size unit conversions (B, KB, MB, GB)Process: Process management utilitiesSystem: System information utilitiesTmux: Tmux session management
Read the presentation on Monitoring: http://ur1.ca/45ku5 (previously named Watchdog).
Revised BSD License
Sébastien Pierre [email protected]