Skip to content

scoggeshall/ubiquiti-watchdog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# Ubiquiti airOS Network Watchdog

A lightweight network-health watchdog for **Ubiquiti airOS (XM/XW/TI) radios** designed to automatically recover devices that lose upstream connectivity.

This project deploys a shell watchdog script to multiple radios using **Python + SSH/SCP** and runs it persistently using the airOS startup mechanism.

The watchdog complements the built-in hardware watchdog (`/bin/watchdog`) by monitoring **network reachability instead of kernel health**.

---

# Problem This Solves

airOS radios include a hardware watchdog that protects against:

- kernel lockups
- system hangs
- driver crashes

However it **does not detect network failures**, such as:

- upstream gateway unreachable
- wireless link stuck
- routing failures
- bridge lockups
- airMAX modulation collapse

In these scenarios radios remain online but **stop passing traffic**.

This watchdog monitors network reachability and performs controlled recovery.

---

# Features

## Network Health Monitoring

The script dynamically determines the **default gateway** and periodically tests reachability.

Example command used:

route -n | awk '$1=="0.0.0.0"{print $2; exit}'


The gateway is then tested with ping.

---

## Boot Stabilization Delay

The watchdog waits **10 minutes after boot** before enforcing checks.

This prevents false triggers during:

- firmware upgrades
- radio association
- routing initialization

---

## Controlled Failure Detection

Example configuration:

CHECK_INTERVAL = 60 seconds MAXFAIL = 5


Result:

5 consecutive failures → recovery action


---

## Automatic Recovery

If the gateway becomes unreachable repeatedly, the radio reboots.

Example log message:

net-watchdog: gateway unreachable net-watchdog: rebooting after consecutive failures


---

## Reboot Loop Prevention

Persistent reboot tracking prevents endless reboot cycles.

Example behavior:

| Event | Action |
|------|------|
| first failure | reboot |
| second failure | reboot |
| third failure | reboot |
| fourth failure | watchdog disables itself |

A flag file is created:

/etc/persistent/net-watchdog.disabled


This ensures the radio stops rebooting until investigated.

---

## Automatic Recovery Reset

If the device operates normally for ~30 minutes:

reboot counter reset


This prevents historical failures from permanently affecting recovery behavior.

---

## Duplicate Process Protection

A lockfile prevents multiple watchdog instances.

/var/run/net-watchdog.pid


---

# Project Structure

ubiquiti-watchdog/ │ ├─ watchdog.sh ├─ deploy-watchdog.py │ └─ data/ ├─ credentials.json └─ ip_list.csv


---

# Watchdog Operation

The watchdog performs the following loop:

wait 10 minutes after boot

loop forever determine default gateway ping gateway track consecutive failures

if failures >= threshold
    increment reboot counter
    reboot radio

---

# Installation

## Requirements

Python 3 with:

- `paramiko`
- `scp`

Install:

pip install paramiko scp


---

# Configuration

## credentials.json

{ "username": "ubnt", "password": "password" }


---

## ip_list.csv

ip 172.25.1.19 172.25.1.20 172.25.1.21


---

# Deployment

Run:

python deploy-watchdog.py


The script will:

1. connect via SSH
2. upload `watchdog.sh`
3. install it to

/etc/persistent/net-watchdog.sh


4. make it executable
5. register it in

/etc/persistent/rc.poststart


6. persist configuration

cfgmtd -w -p /etc/


---

# Runtime Behavior

Startup flow:

radio boots rc.poststart runs watchdog launches waits 10 minutes monitoring begins


---

# Logging

Events are logged using syslog:

logger "net-watchdog: message"


Example:

net-watchdog: gateway unreachable (fail=3) net-watchdog: rebooting after consecutive failures


---

# Persistent State Files

| File | Purpose |
|-----|-----|
| `/etc/persistent/net-watchdog.state` | reboot counter |
| `/etc/persistent/net-watchdog.disabled` | disables watchdog |
| `/var/run/net-watchdog.pid` | process lock |

---

# Compatibility

Tested on:

airOS 6.x XM / XW / TI platforms


Example device:

NanoStation loco M5 firmware: XW.ar934x.v6.3.24


---

# Safety Design

The script intentionally includes several safeguards.

### Boot Delay

Prevents false triggers during device startup.

### Failure Threshold

Avoids rebooting for transient packet loss.

### Reboot Limit

Prevents infinite reboot loops.

### Persistent Disable Flag

Stops watchdog if repeated failures occur.

---

# Recommended Settings

BOOT_GRACE=600 CHECK_INTERVAL=60 MAXFAIL=5 MAX_REBOOTS=3 CLEAR_AFTER_GOOD=30


These values provide stable behavior for most deployments.

---

# Example Use Cases

- rural wireless links
- WISP infrastructure
- remote bridge radios
- solar powered relays
- hard-to-reach rooftop installations

---

# Limitations

The watchdog only tests **gateway reachability**.

It does not detect:

- partial packet loss
- RF degradation
- asymmetric routing
- layer-2 loop conditions

These require separate monitoring.

---

# License

GPL-compatible.

About

Automated deployment of a persistent network watchdog for Ubiquiti airOS devices, enabling gateway-based health monitoring and automatic recovery.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors