Skip to content

AvizNetworks/ONES_ANALYSIS_AGENT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ONES Analysis Agent

An AI-powered autonomous monitoring agent for the Aviz Networks ONES network management platform. It logs into the ONES dashboard, explores every page using a Claude-guided browser, collects server and Docker container metrics over SSH, analyzes everything with Claude's vision API, and posts a rich daily health report to Slack β€” automatically.


Table of Contents


How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. Browser Agent (Playwright + Claude Vision)               β”‚
β”‚     Logs into ONES dashboard, asks Claude to identify nav    β”‚
β”‚     items, recursively explores every page & sub-page, and   β”‚
β”‚     captures full-page screenshots (up to 4 levels deep).    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ ~60 screenshots
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  2. SSH Monitor (Paramiko)                                    β”‚
β”‚     SSH-connects to the ONES server, escalates to root,      β”‚
β”‚     collects Docker container stats, host memory/CPU/disk,   β”‚
β”‚     recent error logs, and pre-flags anomalies by threshold. β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ metrics + anomalies
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  3. Analyzer (Claude API β€” multimodal)                       β”‚
β”‚     Sends all screenshots + SSH metrics to Claude Sonnet.    β”‚
β”‚     Distinguishes ONES platform issues from network issues   β”‚
β”‚     ONES detects. Returns structured status + findings +     β”‚
β”‚     recommendations.                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ analysis result
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  4. Slack Reporter (Slack SDK)                               β”‚
β”‚     Posts a formatted report block with status, summary,     β”‚
β”‚     findings, and recommendations. Uploads screenshots and   β”‚
β”‚     raw SSH output as thread replies.                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Architecture

aviz-monitor/
β”œβ”€β”€ main.py                 # Orchestrator β€” runs all 3 steps in sequence
β”œβ”€β”€ scheduler.py            # Daily scheduling daemon
β”œβ”€β”€ config.py               # Central configuration (credentials, thresholds, URLs)
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ setup.sh                # One-time environment setup script
β”‚
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ browser_agent.py    # AI-guided recursive dashboard exploration + screenshots
β”‚   β”œβ”€β”€ ssh_monitor.py      # Server health collection via SSH (Docker, host metrics)
β”‚   β”œβ”€β”€ analyzer.py         # Claude multimodal analysis of screenshots + SSH data
β”‚   └── screenshot_agent.py # Simple static screenshot fallback
β”‚
β”œβ”€β”€ reporter/
β”‚   └── slack_reporter.py   # Slack block builder + image uploader + thread poster
β”‚
└── screenshots/            # Output directory for captured PNG screenshots

Prerequisites

Requirement Version
Python 3.9+
Playwright Chromium (installed by setup.sh)
ONES dashboard Network-reachable, with credentials
ONES server SSH Root-accessible via password or key
Slack Bot chat:write, files:upload, channels:read scopes
Anthropic API key Claude API access

Installation

# 1. Clone the repository
git clone https://github.com/AvizNetworks/ONES_ANALYSIS_AGENT.git
cd ONES_ANALYSIS_AGENT/aviz-monitor

# 2. Run the one-time setup (creates venv, installs deps, installs Chromium)
bash setup.sh

# 3. Activate the virtual environment
source venv/bin/activate

Configuration

Edit aviz-monitor/config.py and fill in your values. Alternatively, set secrets as environment variables (recommended for production).

ONES Dashboard

APP_URL      = "https://your-ones-instance.example.com"
APP_USERNAME = "admin"
APP_PASSWORD = "your-password"

SSH / Server Access

SSH_HOST     = "10.20.0.37"    # IP of the ONES server
SSH_PORT     = 22
SSH_USER     = "aviz"
SSH_PASSWORD = "your-ssh-password"
SSH_KEY_PATH = ""              # path to private key, or leave empty for password auth

Alert Thresholds

DOCKER_MEMORY_WARNING_PERCENT = 85   # alert if any container uses >85% of its limit
HOST_MEMORY_WARNING_PERCENT   = 90   # alert if host RAM >90%
HOST_DISK_WARNING_PERCENT     = 85   # alert if any disk partition >85%

Slack

SLACK_BOT_TOKEN = ""        # or set env var SLACK_BOT_TOKEN
SLACK_CHANNEL   = "#your-channel"

Required Slack Bot scopes: chat:write, files:upload, channels:read, groups:read

Claude API

ANTHROPIC_API_KEY = ""      # or set env var ANTHROPIC_API_KEY

Schedule

REPORT_TIME = "10:00"       # 24-hour local time for daily report

Environment Variables (alternative to config.py)

For secrets, prefer environment variables:

export SLACK_BOT_TOKEN="xoxb-..."
export ANTHROPIC_API_KEY="sk-ant-..."

Running the Agent

One-time run (with Slack post)

cd aviz-monitor
source venv/bin/activate
python main.py

Dry run (no Slack post β€” for testing)

python main.py --dry-run

Screenshots are still captured and the analysis is printed to stdout.


Scheduling Daily Reports

The scheduler runs main.py every day at the time set in REPORT_TIME.

Start the scheduler

nohup python scheduler.py > scheduler.log 2>&1 &
echo $! > scheduler.pid
echo "Scheduler started (PID $(cat scheduler.pid))"

Run immediately, then keep scheduling

python scheduler.py --now

Stop the scheduler

kill $(cat scheduler.pid)

Check scheduler status

tail -f scheduler.log

Sample Report Output

Below is an example Slack report generated for demo.aviznetworks.com.


πŸ”΄ demo.aviznetworks.com Daily Health Report β€” πŸ•™ Sunday, April 12 2026 at 05:20 PM


Overall Status: πŸ”΄ CRITICAL

Summary

The ONES platform has several significant self-health issues that require immediate attention. The ones-collector Docker container is consuming an abnormally high 667.87% CPU and has accumulated 29.30 GiB of memory usage, indicating a likely runaway process or data ingestion overload. Additionally, the ServiceNow data connector is in an Inactive state, breaking that ticketing integration, and the Manage > Configuration page returns a "Page doesn't exist" 404 error, indicating a broken UI route. The ONES Appliance Health is self-reported as Critical in the Admin View, confirming platform-level distress.


🚨 ONES Platform Findings

# Severity Finding
1 πŸ”΄ ones-collector container β€” extreme CPU consumption: Reporting 667.87% CPU utilization (multi-core aggregate), which is grossly abnormal. This container is responsible for ingesting telemetry from all monitored devices and is clearly in a runaway or overloaded state.
2 πŸ”΄ ONES Appliance Health self-reported as CRITICAL: The Admin View Overview panel explicitly shows ONES Appliance Health: Critical, confirming the platform itself is in a distressed state.
3 πŸ”΄ Manage > Configuration page β€” 404 UI error: Navigating to /manage/configuration renders "The Page you're looking for doesn't exist." This is a broken ONES UI route.
4 :large_orange_circle: ServiceNow data connector β€” Inactive: Under Settings > Data Connectors, the ServiceNow integration shows status Inactive (connected 23 Mar 2026). Slack and Zendesk are Active, but ServiceNow ticket auto-creation/sync is currently broken.
5 :large_orange_circle: ONES server CPU at 69.25% and Memory at 69.75%: Both are elevated and approaching the warning threshold (60%). Combined with the ones-collector runaway CPU this indicates overall resource pressure on the host.
6 :large_orange_circle: ones-collector memory at 29.30 GiB / 125.88 GiB: Disproportionately high compared to all other containers and trending toward potential memory pressure.
7 :large_yellow_circle: ones-gateway container uptime anomaly: Shows ~1 hour uptime while all other containers show "Up 2 weeks." This strongly suggests the ones-gateway restarted recently.
8 :large_yellow_circle: kafka-connect CPU at 41.12%: Elevated for a messaging bridge container; warrants monitoring especially given the collector's load.
9 :large_yellow_circle: 3 devices not streaming telemetry (Not Streaming: 3/111): ONES is unable to receive telemetry from 3 devices, representing a data pipeline gap.
10 :large_yellow_circle: Help > Support and Help > Documentation pages render blank: Both pages load with no content in the main panel β€” a rendering or content-loading issue.
11 ℹ️ No SSH/server metrics available: Host-level swap and disk data could not be collected via SSH. Disk usage shown in ONES UI is 26.24% (within normal range).

πŸ“‘ Network Status (observed by ONES)

  • 111 total devices monitored across 4 regions (Houston, San Jose, Denver, Nyk); 108/111 actively streaming telemetry
  • Critical alerts active across all fabrics in the last 12 hours; 27 activities logged β€” ONES alert engine is firing and processing normally
  • 8/666 links down detected across the fabric topology; 2 devices unreachable, 15 unhealthy, 15 with faulty fans, 14 with faulty PSUs β€” all detected and reported by ONES correctly
  • BGP, LACP, VXLAN, VLAN, QoS, VRRP protocol views all rendering with data; 243 total alerts tracked (221 device, 22 interface) β€” ONES data pipeline is largely functional for the 108 streaming devices

πŸ’‘ Recommended Actions

  1. Immediately investigate and restart ones-collector β€” A CPU utilization of 667.87% is symptomatic of a runaway thread, infinite retry loop, or telemetry storm. Run docker logs ones-collector --tail 500 to identify the cause. Consider docker restart ones-collector after capturing logs.

  2. Investigate the recent ones-gateway restart β€” Determine why this container restarted (~1 hour ago). Review logs with docker logs ones-gateway --tail 200. If crash-looping, identify the root cause (OOM kill, config error, dependency failure).

  3. Re-activate the ServiceNow data connector β€” Navigate to Settings > Data Connectors > ServiceNow, verify API credentials/token have not expired (connector was configured 23 Mar 2026), and re-authenticate.

  4. Fix the broken Manage > Configuration UI route β€” The 404 at /manage/configuration suggests a missing route definition or a build artifact issue in the ones-ui container. A redeployment of ones-ui (currently v4.1.0) may resolve this.

  5. Monitor and address server-level resource pressure β€” With host CPU at 69.25% and memory at 69.75% β€” both above the 60% warning threshold β€” review whether the ONES server needs vertical scaling or whether the ones-collector runaway is the primary driver.

  6. Investigate the 3 non-streaming devices β€” Check ones-collector logs for connection errors to these specific device IPs. Restore streaming to close the data pipeline gap.

  7. Obtain SSH access to collect full host metrics β€” The absence of SSH-based memory, swap, and disk metrics creates a blind spot. Ensure SSH health collection is functional and re-run the diagnostic.

  8. Investigate blank Help pages β€” Check if the embedded documentation service or iframe source is unreachable. Low priority but affects operator usability.


Screenshots (posted as thread replies in Slack)

The agent captures screenshots at up to 4 depth levels and uploads them to the Slack thread. Below are examples from the April 12 run:

Depth Label Description
d0 Overview Top-level ONES dashboard overview
d1 Admin View Admin sidebar β€” appliance health, platform status
d1 Monitor Device monitoring panel
d1 Analyze Traffic and analytics view
d1 Manage Configuration management pages
d1 Settings Data connectors, alert rules
d2 Fabrics Fabric topology
d2 Thresholds Configured alert thresholds
d2 Traffic Interface traffic overview
d3 Devices Device list
d3 Topology Network topology map
d3 Data Connectors ServiceNow / Slack / Zendesk status
d4 Alerts Active alert detail
d4 Health Device health drilldown

Project Structure

ONES_ANALYSIS_AGENT/
└── aviz-monitor/
    β”œβ”€β”€ main.py               # Entry point β€” orchestrates all 3 steps
    β”œβ”€β”€ scheduler.py          # Daily scheduling daemon
    β”œβ”€β”€ config.py             # All configuration (fill this in)
    β”œβ”€β”€ requirements.txt      # Python dependencies
    β”œβ”€β”€ setup.sh              # One-time setup script
    β”œβ”€β”€ agents/
    β”‚   β”œβ”€β”€ browser_agent.py  # Playwright + Claude vision nav explorer
    β”‚   β”œβ”€β”€ ssh_monitor.py    # SSH + Docker metrics collector
    β”‚   β”œβ”€β”€ analyzer.py       # Claude multimodal analysis engine
    β”‚   └── screenshot_agent.py  # Simple static screenshot fallback
    β”œβ”€β”€ reporter/
    β”‚   └── slack_reporter.py # Slack rich block poster + image uploader
    └── screenshots/          # PNG captures (gitignored in production)

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature-name
  3. Make your changes and test with python main.py --dry-run
  4. Commit with a clear message: git commit -m "feat: describe your change"
  5. Push and open a Pull Request against main

Areas open for contribution:

  • Support for additional notification channels (Teams, PagerDuty, email)
  • Configurable page exploration depth and screenshot limits
  • Historical trend tracking (compare today's report vs. last week)
  • Docker Compose / Kubernetes deployment manifests
  • GitHub Actions workflow for CI dry-run tests
  • Web dashboard for browsing past reports and screenshots
  • Support for multiple ONES instances in a single run

License

This project is maintained by Aviz Networks. Please check with the repository owner for licensing terms before forking or distributing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors