# Ubiquiti airOS Network Watchdog
A lightweight network-health watchdog for **Ubiquiti airOS (XM/XW/TI) radios** designed to automatically recover devices that lose upstream connectivity.
This project deploys a shell watchdog script to multiple radios using **Python + SSH/SCP** and runs it persistently using the airOS startup mechanism.
The watchdog complements the built-in hardware watchdog (`/bin/watchdog`) by monitoring **network reachability instead of kernel health**.
---
# Problem This Solves
airOS radios include a hardware watchdog that protects against:
- kernel lockups
- system hangs
- driver crashes
However it **does not detect network failures**, such as:
- upstream gateway unreachable
- wireless link stuck
- routing failures
- bridge lockups
- airMAX modulation collapse
In these scenarios radios remain online but **stop passing traffic**.
This watchdog monitors network reachability and performs controlled recovery.
---
# Features
## Network Health Monitoring
The script dynamically determines the **default gateway** and periodically tests reachability.
Example command used:
route -n | awk '$1=="0.0.0.0"{print $2; exit}'
The gateway is then tested with ping.
---
## Boot Stabilization Delay
The watchdog waits **10 minutes after boot** before enforcing checks.
This prevents false triggers during:
- firmware upgrades
- radio association
- routing initialization
---
## Controlled Failure Detection
Example configuration:
CHECK_INTERVAL = 60 seconds MAXFAIL = 5
Result:
5 consecutive failures → recovery action
---
## Automatic Recovery
If the gateway becomes unreachable repeatedly, the radio reboots.
Example log message:
net-watchdog: gateway unreachable net-watchdog: rebooting after consecutive failures
---
## Reboot Loop Prevention
Persistent reboot tracking prevents endless reboot cycles.
Example behavior:
| Event | Action |
|------|------|
| first failure | reboot |
| second failure | reboot |
| third failure | reboot |
| fourth failure | watchdog disables itself |
A flag file is created:
/etc/persistent/net-watchdog.disabled
This ensures the radio stops rebooting until investigated.
---
## Automatic Recovery Reset
If the device operates normally for ~30 minutes:
reboot counter reset
This prevents historical failures from permanently affecting recovery behavior.
---
## Duplicate Process Protection
A lockfile prevents multiple watchdog instances.
/var/run/net-watchdog.pid
---
# Project Structure
ubiquiti-watchdog/ │ ├─ watchdog.sh ├─ deploy-watchdog.py │ └─ data/ ├─ credentials.json └─ ip_list.csv
---
# Watchdog Operation
The watchdog performs the following loop:
wait 10 minutes after boot
loop forever determine default gateway ping gateway track consecutive failures
if failures >= threshold
increment reboot counter
reboot radio
---
# Installation
## Requirements
Python 3 with:
- `paramiko`
- `scp`
Install:
pip install paramiko scp
---
# Configuration
## credentials.json
{ "username": "ubnt", "password": "password" }
---
## ip_list.csv
ip 172.25.1.19 172.25.1.20 172.25.1.21
---
# Deployment
Run:
python deploy-watchdog.py
The script will:
1. connect via SSH
2. upload `watchdog.sh`
3. install it to
/etc/persistent/net-watchdog.sh
4. make it executable
5. register it in
/etc/persistent/rc.poststart
6. persist configuration
cfgmtd -w -p /etc/
---
# Runtime Behavior
Startup flow:
radio boots rc.poststart runs watchdog launches waits 10 minutes monitoring begins
---
# Logging
Events are logged using syslog:
logger "net-watchdog: message"
Example:
net-watchdog: gateway unreachable (fail=3) net-watchdog: rebooting after consecutive failures
---
# Persistent State Files
| File | Purpose |
|-----|-----|
| `/etc/persistent/net-watchdog.state` | reboot counter |
| `/etc/persistent/net-watchdog.disabled` | disables watchdog |
| `/var/run/net-watchdog.pid` | process lock |
---
# Compatibility
Tested on:
airOS 6.x XM / XW / TI platforms
Example device:
NanoStation loco M5 firmware: XW.ar934x.v6.3.24
---
# Safety Design
The script intentionally includes several safeguards.
### Boot Delay
Prevents false triggers during device startup.
### Failure Threshold
Avoids rebooting for transient packet loss.
### Reboot Limit
Prevents infinite reboot loops.
### Persistent Disable Flag
Stops watchdog if repeated failures occur.
---
# Recommended Settings
BOOT_GRACE=600 CHECK_INTERVAL=60 MAXFAIL=5 MAX_REBOOTS=3 CLEAR_AFTER_GOOD=30
These values provide stable behavior for most deployments.
---
# Example Use Cases
- rural wireless links
- WISP infrastructure
- remote bridge radios
- solar powered relays
- hard-to-reach rooftop installations
---
# Limitations
The watchdog only tests **gateway reachability**.
It does not detect:
- partial packet loss
- RF degradation
- asymmetric routing
- layer-2 loop conditions
These require separate monitoring.
---
# License
GPL-compatible.