An AI-powered autonomous monitoring agent for the Aviz Networks ONES network management platform. It logs into the ONES dashboard, explores every page using a Claude-guided browser, collects server and Docker container metrics over SSH, analyzes everything with Claude's vision API, and posts a rich daily health report to Slack β automatically.
- How It Works
- Architecture
- Prerequisites
- Installation
- Configuration
- Running the Agent
- Scheduling Daily Reports
- Sample Report Output
- Project Structure
- Contributing
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Browser Agent (Playwright + Claude Vision) β
β Logs into ONES dashboard, asks Claude to identify nav β
β items, recursively explores every page & sub-page, and β
β captures full-page screenshots (up to 4 levels deep). β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β ~60 screenshots
ββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β 2. SSH Monitor (Paramiko) β
β SSH-connects to the ONES server, escalates to root, β
β collects Docker container stats, host memory/CPU/disk, β
β recent error logs, and pre-flags anomalies by threshold. β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β metrics + anomalies
ββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β 3. Analyzer (Claude API β multimodal) β
β Sends all screenshots + SSH metrics to Claude Sonnet. β
β Distinguishes ONES platform issues from network issues β
β ONES detects. Returns structured status + findings + β
β recommendations. β
ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β analysis result
ββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β 4. Slack Reporter (Slack SDK) β
β Posts a formatted report block with status, summary, β
β findings, and recommendations. Uploads screenshots and β
β raw SSH output as thread replies. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
aviz-monitor/
βββ main.py # Orchestrator β runs all 3 steps in sequence
βββ scheduler.py # Daily scheduling daemon
βββ config.py # Central configuration (credentials, thresholds, URLs)
βββ requirements.txt # Python dependencies
βββ setup.sh # One-time environment setup script
β
βββ agents/
β βββ browser_agent.py # AI-guided recursive dashboard exploration + screenshots
β βββ ssh_monitor.py # Server health collection via SSH (Docker, host metrics)
β βββ analyzer.py # Claude multimodal analysis of screenshots + SSH data
β βββ screenshot_agent.py # Simple static screenshot fallback
β
βββ reporter/
β βββ slack_reporter.py # Slack block builder + image uploader + thread poster
β
βββ screenshots/ # Output directory for captured PNG screenshots
| Requirement | Version |
|---|---|
| Python | 3.9+ |
| Playwright Chromium | (installed by setup.sh) |
| ONES dashboard | Network-reachable, with credentials |
| ONES server SSH | Root-accessible via password or key |
| Slack Bot | chat:write, files:upload, channels:read scopes |
| Anthropic API key | Claude API access |
# 1. Clone the repository
git clone https://github.com/AvizNetworks/ONES_ANALYSIS_AGENT.git
cd ONES_ANALYSIS_AGENT/aviz-monitor
# 2. Run the one-time setup (creates venv, installs deps, installs Chromium)
bash setup.sh
# 3. Activate the virtual environment
source venv/bin/activateEdit aviz-monitor/config.py and fill in your values. Alternatively, set secrets as environment variables (recommended for production).
APP_URL = "https://your-ones-instance.example.com"
APP_USERNAME = "admin"
APP_PASSWORD = "your-password"SSH_HOST = "10.20.0.37" # IP of the ONES server
SSH_PORT = 22
SSH_USER = "aviz"
SSH_PASSWORD = "your-ssh-password"
SSH_KEY_PATH = "" # path to private key, or leave empty for password authDOCKER_MEMORY_WARNING_PERCENT = 85 # alert if any container uses >85% of its limit
HOST_MEMORY_WARNING_PERCENT = 90 # alert if host RAM >90%
HOST_DISK_WARNING_PERCENT = 85 # alert if any disk partition >85%SLACK_BOT_TOKEN = "" # or set env var SLACK_BOT_TOKEN
SLACK_CHANNEL = "#your-channel"Required Slack Bot scopes:
chat:write,files:upload,channels:read,groups:read
ANTHROPIC_API_KEY = "" # or set env var ANTHROPIC_API_KEYREPORT_TIME = "10:00" # 24-hour local time for daily reportFor secrets, prefer environment variables:
export SLACK_BOT_TOKEN="xoxb-..."
export ANTHROPIC_API_KEY="sk-ant-..."cd aviz-monitor
source venv/bin/activate
python main.pypython main.py --dry-runScreenshots are still captured and the analysis is printed to stdout.
The scheduler runs main.py every day at the time set in REPORT_TIME.
nohup python scheduler.py > scheduler.log 2>&1 &
echo $! > scheduler.pid
echo "Scheduler started (PID $(cat scheduler.pid))"python scheduler.py --nowkill $(cat scheduler.pid)tail -f scheduler.logBelow is an example Slack report generated for demo.aviznetworks.com.
Overall Status: π΄ CRITICAL
Summary
The ONES platform has several significant self-health issues that require immediate attention. The ones-collector Docker container is consuming an abnormally high 667.87% CPU and has accumulated 29.30 GiB of memory usage, indicating a likely runaway process or data ingestion overload. Additionally, the ServiceNow data connector is in an Inactive state, breaking that ticketing integration, and the Manage > Configuration page returns a "Page doesn't exist" 404 error, indicating a broken UI route. The ONES Appliance Health is self-reported as Critical in the Admin View, confirming platform-level distress.
| # | Severity | Finding |
|---|---|---|
| 1 | π΄ | ones-collector container β extreme CPU consumption: Reporting 667.87% CPU utilization (multi-core aggregate), which is grossly abnormal. This container is responsible for ingesting telemetry from all monitored devices and is clearly in a runaway or overloaded state. |
| 2 | π΄ | ONES Appliance Health self-reported as CRITICAL: The Admin View Overview panel explicitly shows ONES Appliance Health: Critical, confirming the platform itself is in a distressed state. |
| 3 | π΄ | Manage > Configuration page β 404 UI error: Navigating to /manage/configuration renders "The Page you're looking for doesn't exist." This is a broken ONES UI route. |
| 4 | :large_orange_circle: | ServiceNow data connector β Inactive: Under Settings > Data Connectors, the ServiceNow integration shows status Inactive (connected 23 Mar 2026). Slack and Zendesk are Active, but ServiceNow ticket auto-creation/sync is currently broken. |
| 5 | :large_orange_circle: | ONES server CPU at 69.25% and Memory at 69.75%: Both are elevated and approaching the warning threshold (60%). Combined with the ones-collector runaway CPU this indicates overall resource pressure on the host. |
| 6 | :large_orange_circle: | ones-collector memory at 29.30 GiB / 125.88 GiB: Disproportionately high compared to all other containers and trending toward potential memory pressure. |
| 7 | :large_yellow_circle: | ones-gateway container uptime anomaly: Shows ~1 hour uptime while all other containers show "Up 2 weeks." This strongly suggests the ones-gateway restarted recently. |
| 8 | :large_yellow_circle: | kafka-connect CPU at 41.12%: Elevated for a messaging bridge container; warrants monitoring especially given the collector's load. |
| 9 | :large_yellow_circle: | 3 devices not streaming telemetry (Not Streaming: 3/111): ONES is unable to receive telemetry from 3 devices, representing a data pipeline gap. |
| 10 | :large_yellow_circle: | Help > Support and Help > Documentation pages render blank: Both pages load with no content in the main panel β a rendering or content-loading issue. |
| 11 | βΉοΈ | No SSH/server metrics available: Host-level swap and disk data could not be collected via SSH. Disk usage shown in ONES UI is 26.24% (within normal range). |
- 111 total devices monitored across 4 regions (Houston, San Jose, Denver, Nyk); 108/111 actively streaming telemetry
- Critical alerts active across all fabrics in the last 12 hours; 27 activities logged β ONES alert engine is firing and processing normally
- 8/666 links down detected across the fabric topology; 2 devices unreachable, 15 unhealthy, 15 with faulty fans, 14 with faulty PSUs β all detected and reported by ONES correctly
- BGP, LACP, VXLAN, VLAN, QoS, VRRP protocol views all rendering with data; 243 total alerts tracked (221 device, 22 interface) β ONES data pipeline is largely functional for the 108 streaming devices
-
Immediately investigate and restart ones-collector β A CPU utilization of 667.87% is symptomatic of a runaway thread, infinite retry loop, or telemetry storm. Run
docker logs ones-collector --tail 500to identify the cause. Considerdocker restart ones-collectorafter capturing logs. -
Investigate the recent ones-gateway restart β Determine why this container restarted (~1 hour ago). Review logs with
docker logs ones-gateway --tail 200. If crash-looping, identify the root cause (OOM kill, config error, dependency failure). -
Re-activate the ServiceNow data connector β Navigate to Settings > Data Connectors > ServiceNow, verify API credentials/token have not expired (connector was configured 23 Mar 2026), and re-authenticate.
-
Fix the broken Manage > Configuration UI route β The 404 at
/manage/configurationsuggests a missing route definition or a build artifact issue in the ones-ui container. A redeployment of ones-ui (currently v4.1.0) may resolve this. -
Monitor and address server-level resource pressure β With host CPU at 69.25% and memory at 69.75% β both above the 60% warning threshold β review whether the ONES server needs vertical scaling or whether the ones-collector runaway is the primary driver.
-
Investigate the 3 non-streaming devices β Check
ones-collectorlogs for connection errors to these specific device IPs. Restore streaming to close the data pipeline gap. -
Obtain SSH access to collect full host metrics β The absence of SSH-based memory, swap, and disk metrics creates a blind spot. Ensure SSH health collection is functional and re-run the diagnostic.
-
Investigate blank Help pages β Check if the embedded documentation service or iframe source is unreachable. Low priority but affects operator usability.
The agent captures screenshots at up to 4 depth levels and uploads them to the Slack thread. Below are examples from the April 12 run:
| Depth | Label | Description |
|---|---|---|
| d0 | Overview | Top-level ONES dashboard overview |
| d1 | Admin View | Admin sidebar β appliance health, platform status |
| d1 | Monitor | Device monitoring panel |
| d1 | Analyze | Traffic and analytics view |
| d1 | Manage | Configuration management pages |
| d1 | Settings | Data connectors, alert rules |
| d2 | Fabrics | Fabric topology |
| d2 | Thresholds | Configured alert thresholds |
| d2 | Traffic | Interface traffic overview |
| d3 | Devices | Device list |
| d3 | Topology | Network topology map |
| d3 | Data Connectors | ServiceNow / Slack / Zendesk status |
| d4 | Alerts | Active alert detail |
| d4 | Health | Device health drilldown |
ONES_ANALYSIS_AGENT/
βββ aviz-monitor/
βββ main.py # Entry point β orchestrates all 3 steps
βββ scheduler.py # Daily scheduling daemon
βββ config.py # All configuration (fill this in)
βββ requirements.txt # Python dependencies
βββ setup.sh # One-time setup script
βββ agents/
β βββ browser_agent.py # Playwright + Claude vision nav explorer
β βββ ssh_monitor.py # SSH + Docker metrics collector
β βββ analyzer.py # Claude multimodal analysis engine
β βββ screenshot_agent.py # Simple static screenshot fallback
βββ reporter/
β βββ slack_reporter.py # Slack rich block poster + image uploader
βββ screenshots/ # PNG captures (gitignored in production)
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes and test with
python main.py --dry-run - Commit with a clear message:
git commit -m "feat: describe your change" - Push and open a Pull Request against
main
Areas open for contribution:
- Support for additional notification channels (Teams, PagerDuty, email)
- Configurable page exploration depth and screenshot limits
- Historical trend tracking (compare today's report vs. last week)
- Docker Compose / Kubernetes deployment manifests
- GitHub Actions workflow for CI dry-run tests
- Web dashboard for browsing past reports and screenshots
- Support for multiple ONES instances in a single run
This project is maintained by Aviz Networks. Please check with the repository owner for licensing terms before forking or distributing.