Skip to content

autonomously-io/awesome-autonomous-operation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Autonomous Operation

Awesome

A curated list of awesome resources for AI-driven autonomous operations — a concept encompassing observability, telecom automation, AIOps, and beyond.

Contents


Similar Projects & Awesome Lists

Autonomous Operation Platforms

Open Source

  • ONAP (GitHub) - Open Network Automation Platform for real-time, policy-driven orchestration and automation.
    • Full MANO + closed-loop automation. The most complete open source telecom automation platform.
  • Nephio (GitHub) - Cloud native intent automation for telecom.
    • Kubernetes-native. Uses KRM (Kubernetes Resource Model) for intent-based NF deployment.
  • StackStorm - Event-driven automation platform for auto-remediation and ChatOps.
    • If-this-then-that automation with 160+ integrations. Bridges IT and network operations.
  • Rundeck - Operations automation with runbook management.
  • OSM (Open Source MANO) - ETSI-hosted NFV Management and Orchestration stack.

Commercial / Industry Solutions

Observability

  • OpenObserve - Petabyte scale Elasticsearch/Splunk/Datadog alternative for logs, metrics, traces, RUM, errors, and session replay. ⭐ autonomously-io component
  • Keep - Open source AIOps and alert management platform with AI correlation and YAML workflow automation. ⭐ autonomously-io component
  • Prometheus - Monitoring system and time series database.
  • Grafana - Open and composable observability and data visualization platform.
  • Jaeger - Distributed tracing platform.
  • OpenTelemetry - Vendor-agnostic telemetry data collection. The universal observability data plane.
  • SigNoz - Open-source APM and observability platform. OpenTelemetry-native.
  • Graylog - Log management platform.

Data Pipeline

  • Vector - High-performance observability data pipeline. ⭐ autonomously-io component
  • NATS - High-performance cloud-native messaging system. ⭐ autonomously-io component
  • Fluentd - Unified logging layer.
  • Fluent Bit - Lightweight log processor and forwarder.
  • Apache Kafka - Distributed event streaming platform.
  • Logstash - Server-side data processing pipeline.

AIOps

Open Source

  • PyRCA - Python ML library for metric-based root cause analysis (Salesforce).
    • Implements Bayesian, PC, GES, and causal discovery methods for RCA.
  • LogAI - Open source library for AI-based log analytics (Salesforce).
    • Log summarization, clustering, anomaly detection, and pattern mining.
  • Qdrant - Vector similarity search engine for RAG-based operational knowledge retrieval. ⭐ autonomously-io component

Commercial

  • Moogsoft - AI-powered IT operations platform. Pioneered AIOps category.
  • BigPanda - Event correlation and automation for IT operations. Open Box ML for explainable correlation.
  • Dynatrace - Software intelligence platform with Davis AI engine.
  • Datadog - Unified IT and network monitoring.

Telecom Automation

Open Source 5G Core

Open Source RAN

  • srsRAN Project (GitHub) - O-RAN native 5G CU/DU in portable C++.
  • srsRAN 4G - Open source SDR 4G software suite (eNB, UE, EPC).
  • O-RAN SC - O-RAN Software Community for open RAN automation.

Platforms & Infrastructure

  • Anuket - Common reference infrastructure for telecom.
  • LF Networking - Umbrella for open source networking projects (ONAP, FD.io, Nephio, etc.).
  • Duranta (LFN + OAI) - Joint LFN/OAI initiative for open source RAN.
  • OPNFV - Open platform for NFV.
  • XOS / CORD - Central Office Re-architected as a Datacenter.

Self-Healing & Auto-Remediation

  • Kuberhealthy - Kubernetes operator for synthetic monitoring and auto-healing.
  • Chaos Mesh - Cloud-native chaos engineering platform.
  • Litmus - Chaos engineering framework for Kubernetes.
  • Keptn - Cloud-native application life-cycle orchestration with automated quality gates.
  • Crossplane - Cloud-native control plane for infrastructure. Extends Kubernetes to manage any resource.

LLM/AI for Operations

Frameworks & Models

  • TelecomGPT - Framework to build telecom-specific LLMs. Fine-tuned Llama3-8B outperforms GPT-4 on telecom tasks.
  • NetLLM (SIGCOMM 2024) - First framework adapting LLMs for networking tasks (viewport prediction, ABR, scheduling).
  • MeshAgent (SIGMETRICS 2026) - LLM agent for reliable network management using constraint-based knowledge extraction.
  • MM-Telco - Multimodal LLM benchmarks for telecom applications.
  • Network Arena - Benchmark arena for evaluating AI agents on network troubleshooting.
  • GSMA Open-Telco LLM Benchmarks (Hugging Face) - Open-source benchmark evaluating AI models on telecom use cases.

Surveys

Incident Management & SRE

Incident Management

  • PagerDuty - Digital operations management with AI-powered incident response.
  • Rootly - Incident management with Slack-native workflows and automated runbooks.
  • incident.io - Incident management built for modern engineering teams.
  • Backstage - CNCF developer portal for service catalog and operational tooling.

SRE Practices

  • Google SRE Books - Foundational texts on Site Reliability Engineering.
    • SLO/SLI/SLA frameworks, error budgets, and toil reduction — directly applicable to network operations.
  • OpenSLO - Open specification for defining SLOs as code. Applicable to both IT services and network KPIs.
  • Sloth - SLO generation for Prometheus.

Infrastructure as Code

  • Kubernetes - Container orchestration. The convergence point for telecom CNFs and IT workloads.
  • Terraform - IaC for multi-cloud and on-premises. Providers exist for network devices and cloud infrastructure.
  • Ansible Network Automation - Agentless automation for network and IT infrastructure.
  • YANG Models Repository - Community repository of YANG data models for model-driven network management.

Network Monitoring & Flow Analysis

Awesome Lists

Infrastructure Monitoring

  • Zabbix - Enterprise-class open source monitoring for networks, servers, and applications. SNMP, IPMI, JMX, agent-based.
  • LibreNMS - Auto-discovering network monitoring with SNMP support, alerting, and device tracking.
  • Nagios - Industry-standard infrastructure monitoring. Extensive plugin ecosystem.
  • Icinga - Scalable monitoring system, fork of Nagios with modern architecture.
  • Cacti - Network graphing solution using SNMP and RRDtool.
  • Observium - Low-maintenance auto-discovering network monitoring with SNMP.
  • Checkmk - Infrastructure and application monitoring with auto-discovery.
  • Netdata - Real-time full-stack monitoring with per-second metrics and zero-config agent. 78k stars.
  • OpenNMS - Enterprise-grade network management platform. SNMP, flow analysis, topology discovery.
  • NetBox - The premier source of truth for network infrastructure. IPAM, DCIM, and network documentation. 20k stars.

Flow Collection & Analysis (sFlow / NetFlow / IPFIX)

  • ntopng - Web-based traffic and security network monitoring. sFlow, NetFlow, IPFIX support. 7.6k stars.
  • Akvorado - Flow collector, enricher, and visualizer. sFlow, NetFlow, IPFIX with ClickHouse backend.
  • goflow2 - High-performance sFlow/IPFIX/NetFlow collector. Successor to Cloudflare's goflow.
  • pmacct - Passive network monitoring tools. NetFlow, IPFIX, sFlow, and BGP collection.
  • Elastiflow - Network flow analytics (NetFlow, sFlow, IPFIX) with the Elastic Stack.
  • FastNetMon - High-speed DDoS detection via sFlow, NetFlow, IPFIX, and port mirroring.
  • vFlow - Enterprise network flow collector (IPFIX, sFlow, NetFlow).
  • nfdump - NetFlow/sFlow/IPFIX processing tools.
  • flow-pipeline - Tools and examples for operating sFlow and NetFlow pipelines (Cloudflare).

Active Measurement & Probing

  • perfSONAR (GitHub, Docs) - Federated network measurement toolkit (OWAMP, TWAMP, iperf3, traceroute).
  • SmokePing (GitHub) - Latency measurement with RRDtool graphing and pattern-based alerting.
  • Vaping - Healthy alternative to SmokePing — pluggable latency and reachability testing.
  • UDPing - Measures latency and packet loss using UDP.

TWAMP / STAMP

Packet Capture & Deep Analysis

  • Arkime - Large-scale full packet capture, indexing, and database system.
  • Zeek - Powerful network analysis framework for security monitoring.
  • Sniffnet - Cross-platform network traffic monitor with real-time visualization. 33k stars.
  • Suricata - Network intrusion detection/prevention engine with protocol analysis.

Related Papers & Standards

Standards & Frameworks

Academic Papers


⭐ = autonomously-io program component

Contributing

Contributions welcome! Please read the contribution guidelines first.

Releases

No releases published

Packages

 
 
 

Contributors