A curated list of awesome resources for AI-driven autonomous operations — a concept encompassing observability, telecom automation, AIOps, and beyond.
- Similar Projects & Awesome Lists
- Autonomous Operation Platforms
- Observability
- Data Pipeline
- AIOps
- Telecom Automation
- Self-Healing & Auto-Remediation
- LLM/AI for Operations
- Incident Management & SRE
- Network Monitoring & Flow Analysis
- Related Papers & Standards
- Awesome AIOps - Curated list of AIOps resources including papers, tools, and datasets. → Analysis
- Awesome Observability - Curated list of observability tools, frameworks, and resources. → Analysis
- Awesome Site Reliability Engineering - Curated list of SRE resources. → Analysis
- Awesome Network Automation - Curated list of network automation resources. → Analysis
- Awesome Telco - Curated list of telecom resources. → Analysis
- Awesome 5G - Curated list of 5G projects and resources. → Analysis
- Awesome Telco Cloud - Curated list of Telco Cloud ecosystem projects. → Analysis
- Awesome Self-Hosting - Self-hosted services and applications. → Analysis
- Awesome Prometheus - Curated list of Prometheus resources. → Analysis
- Awesome OpenTelemetry - Curated list of OpenTelemetry resources. → Analysis
- Awesome Sysadmin - Curated list of open-source sysadmin resources. Comprehensive monitoring, log management, and metrics sections.
- Awesome SNMP - Curated list of SNMP resources — libraries, CLI tools, GUIs, MIB repositories, and RFCs.
- Awesome Monitoring - Infrastructure, OS, and application monitoring tools.
- Awesome Incident Management - Curated list of incident management resources.
- ONAP (GitHub) - Open Network Automation Platform for real-time, policy-driven orchestration and automation.
- Full MANO + closed-loop automation. The most complete open source telecom automation platform.
- Nephio (GitHub) - Cloud native intent automation for telecom.
- Kubernetes-native. Uses KRM (Kubernetes Resource Model) for intent-based NF deployment.
- StackStorm - Event-driven automation platform for auto-remediation and ChatOps.
- If-this-then-that automation with 160+ integrations. Bridges IT and network operations.
- Rundeck - Operations automation with runbook management.
- OSM (Open Source MANO) - ETSI-hosted NFV Management and Orchestration stack.
- Google Cloud Autonomous Network Operations - Blueprint for CSPs transitioning from manual to zero-touch operations.
- New Agents for ANO - Autonomous Data Steward and VoLTE Agents with agentic AI.
- GraphML and Digital Twins - Graph Neural Networks for predictive network management.
- Microsoft Network Operations Agent (NOA) - Multi-agent framework for autonomous network operations.
- NOA Framework Evolution - Specialized agents for NOC, ticketing, telemetry.
- Microsoft NetAI - Internal AI achieving 60% faster fault detection on Azure network.
- Nokia AVA Platform - AI and analytics platform for cognitive network operations.
- Autonomous Operations Vision - Proposes unified operations layer spanning network and IT.
- Cisco Crosswork - Intent-based, closed-loop network automation suite.
- Multi-Agentic AI Framework - Specialized AI agents coordinated by a meta-agent.
- Agentic NOC Operations - Vision for NOCless/Dark NOC operations.
- Ericsson NWDAF - 3GPP-standardized Network Data Analytics Function for 5G core.
- Huawei ADN - Autonomous Driving Network with GenAI, agents, and digital twins.
- Campus L4 ADN (MWC 2026) - Industry's first L4 autonomous network deployment.
- HPE Mist AI - AI-native networking for self-driving networks.
- Mist Agentic AI - Marvis AI agents for cross-domain autonomous remediation.
- NVIDIA Telecom AI - GPU-accelerated AI for telecom.
- OpenObserve - Petabyte scale Elasticsearch/Splunk/Datadog alternative for logs, metrics, traces, RUM, errors, and session replay. ⭐ autonomously-io component
- Keep - Open source AIOps and alert management platform with AI correlation and YAML workflow automation. ⭐ autonomously-io component
- Prometheus - Monitoring system and time series database.
- Grafana - Open and composable observability and data visualization platform.
- Jaeger - Distributed tracing platform.
- OpenTelemetry - Vendor-agnostic telemetry data collection. The universal observability data plane.
- SigNoz - Open-source APM and observability platform. OpenTelemetry-native.
- Graylog - Log management platform.
- Vector - High-performance observability data pipeline. ⭐ autonomously-io component
- NATS - High-performance cloud-native messaging system. ⭐ autonomously-io component
- Fluentd - Unified logging layer.
- Fluent Bit - Lightweight log processor and forwarder.
- Apache Kafka - Distributed event streaming platform.
- Logstash - Server-side data processing pipeline.
- PyRCA - Python ML library for metric-based root cause analysis (Salesforce).
- Implements Bayesian, PC, GES, and causal discovery methods for RCA.
- LogAI - Open source library for AI-based log analytics (Salesforce).
- Log summarization, clustering, anomaly detection, and pattern mining.
- Qdrant - Vector similarity search engine for RAG-based operational knowledge retrieval. ⭐ autonomously-io component
- Moogsoft - AI-powered IT operations platform. Pioneered AIOps category.
- BigPanda - Event correlation and automation for IT operations. Open Box ML for explainable correlation.
- Dynatrace - Software intelligence platform with Davis AI engine.
- Datadog - Unified IT and network monitoring.
- free5GC (GitHub) - Open source 5G core network based on 3GPP R15+ (Go).
- Open5GS (GitHub) - C-language 5G Core and EPC implementation (Release-17).
- OpenAirInterface - Open source 4G/5G RAN and Core from EURECOM.
- SD-Core (Aether) (GitHub) - Disaggregated 4G/5G mobile core for cloud and edge.
- srsRAN Project (GitHub) - O-RAN native 5G CU/DU in portable C++.
- srsRAN 4G - Open source SDR 4G software suite (eNB, UE, EPC).
- O-RAN SC - O-RAN Software Community for open RAN automation.
- Anuket - Common reference infrastructure for telecom.
- LF Networking - Umbrella for open source networking projects (ONAP, FD.io, Nephio, etc.).
- Duranta (LFN + OAI) - Joint LFN/OAI initiative for open source RAN.
- OPNFV - Open platform for NFV.
- XOS / CORD - Central Office Re-architected as a Datacenter.
- Kuberhealthy - Kubernetes operator for synthetic monitoring and auto-healing.
- Chaos Mesh - Cloud-native chaos engineering platform.
- Litmus - Chaos engineering framework for Kubernetes.
- Keptn - Cloud-native application life-cycle orchestration with automated quality gates.
- Crossplane - Cloud-native control plane for infrastructure. Extends Kubernetes to manage any resource.
- TelecomGPT - Framework to build telecom-specific LLMs. Fine-tuned Llama3-8B outperforms GPT-4 on telecom tasks.
- NetLLM (SIGCOMM 2024) - First framework adapting LLMs for networking tasks (viewport prediction, ABR, scheduling).
- MeshAgent (SIGMETRICS 2026) - LLM agent for reliable network management using constraint-based knowledge extraction.
- MM-Telco - Multimodal LLM benchmarks for telecom applications.
- Network Arena - Benchmark arena for evaluating AI agents on network troubleshooting.
- GSMA Open-Telco LLM Benchmarks (Hugging Face) - Open-source benchmark evaluating AI models on telecom use cases.
- LLM for Telecom Survey - Comprehensive survey on LLM principles, techniques, and opportunities in telecom.
- LLM-Based Network Management Survey - Survey on LLM-based network management and operations (2025).
- AIOps in the Era of LLMs - Survey of 183 papers (2020-2024) on LLMs in AIOps.
- LLMs for Telecom Standards - RAG-based approach to make 3GPP/ETSI specifications accessible via natural language.
- PagerDuty - Digital operations management with AI-powered incident response.
- Rootly - Incident management with Slack-native workflows and automated runbooks.
- incident.io - Incident management built for modern engineering teams.
- Backstage - CNCF developer portal for service catalog and operational tooling.
- Google SRE Books - Foundational texts on Site Reliability Engineering.
- SLO/SLI/SLA frameworks, error budgets, and toil reduction — directly applicable to network operations.
- OpenSLO - Open specification for defining SLOs as code. Applicable to both IT services and network KPIs.
- Sloth - SLO generation for Prometheus.
- Kubernetes - Container orchestration. The convergence point for telecom CNFs and IT workloads.
- Terraform - IaC for multi-cloud and on-premises. Providers exist for network devices and cloud infrastructure.
- Ansible Network Automation - Agentless automation for network and IT infrastructure.
- YANG Models Repository - Community repository of YANG data models for model-driven network management.
- Awesome Networking - Curated list of networking resources including monitoring, automation, and design. → Analysis
- Awesome Sysadmin - Monitoring - Monitoring section of the awesome sysadmin list.
- Zabbix - Enterprise-class open source monitoring for networks, servers, and applications. SNMP, IPMI, JMX, agent-based.
- LibreNMS - Auto-discovering network monitoring with SNMP support, alerting, and device tracking.
- Nagios - Industry-standard infrastructure monitoring. Extensive plugin ecosystem.
- Icinga - Scalable monitoring system, fork of Nagios with modern architecture.
- Cacti - Network graphing solution using SNMP and RRDtool.
- Observium - Low-maintenance auto-discovering network monitoring with SNMP.
- Checkmk - Infrastructure and application monitoring with auto-discovery.
- Netdata - Real-time full-stack monitoring with per-second metrics and zero-config agent. 78k stars.
- OpenNMS - Enterprise-grade network management platform. SNMP, flow analysis, topology discovery.
- NetBox - The premier source of truth for network infrastructure. IPAM, DCIM, and network documentation. 20k stars.
- ntopng - Web-based traffic and security network monitoring. sFlow, NetFlow, IPFIX support. 7.6k stars.
- Akvorado - Flow collector, enricher, and visualizer. sFlow, NetFlow, IPFIX with ClickHouse backend.
- goflow2 - High-performance sFlow/IPFIX/NetFlow collector. Successor to Cloudflare's goflow.
- pmacct - Passive network monitoring tools. NetFlow, IPFIX, sFlow, and BGP collection.
- Elastiflow - Network flow analytics (NetFlow, sFlow, IPFIX) with the Elastic Stack.
- FastNetMon - High-speed DDoS detection via sFlow, NetFlow, IPFIX, and port mirroring.
- vFlow - Enterprise network flow collector (IPFIX, sFlow, NetFlow).
- nfdump - NetFlow/sFlow/IPFIX processing tools.
- flow-pipeline - Tools and examples for operating sFlow and NetFlow pipelines (Cloudflare).
- perfSONAR (GitHub, Docs) - Federated network measurement toolkit (OWAMP, TWAMP, iperf3, traceroute).
- SmokePing (GitHub) - Latency measurement with RRDtool graphing and pattern-based alerting.
- Vaping - Healthy alternative to SmokePing — pluggable latency and reachability testing.
- UDPing - Measures latency and packet loss using UDP.
- RFC 5357 - TWAMP - Two-Way Active Measurement Protocol.
- RFC 8762 - STAMP - Simple Two-Way Active Measurement Protocol.
- RFC 8972 - STAMP Extensions - Optional extensions for STAMP performance metrics.
- RFC 9503 - STAMP for SR - STAMP extensions for Segment Routing (SR-MPLS and SRv6).
- TWAMP Overview (Juniper) - Two-Way Active Measurement Protocol implementation overview.
- twampy - Python tools for TWAMP and TWAMP Light (STAMP).
- twamp-rs - TWAMP (RFC 5357) implementation in Rust.
- twamp (Go) - Minimal TWAMP Light (RFC 5357) implementation in Go.
- twamp_exporter - Prometheus exporter for bi-directional network latency measurement via TWAMP.
- vMark - Carrier Ethernet demarcation management system with TWAMP support.
- Arkime - Large-scale full packet capture, indexing, and database system.
- Zeek - Powerful network analysis framework for security monitoring.
- Sniffnet - Cross-platform network traffic monitor with real-time visualization. 33k stars.
- Suricata - Network intrusion detection/prevention engine with protocol analysis.
- TMF Autonomous Networks - TM Forum's autonomous networks initiative and maturity levels (L0-L5).
- AN Levels (ANL) Model - Classification from L0 (manual) to L5 (fully autonomous).
- AN Technical Architecture - Technical architecture toolkit.
- AN Levels Evaluation IG1252 - Evaluation methodology.
- ETSI ZSM - Zero-touch network & Service Management.
- ZSM Closed-Loop Enablers - OODA loop pattern for network automation.
- GR ZSM 011 - Intent-Driven Closed-Loop - Intent as key enabler for autonomous management.
- GR ZSM 011 v2.1.1 (PDF) - Full intent-driven AN specification.
- ETSI ENI - Experiential Networked Intelligence using AI for network management.
- ETSI White Paper No. 40 (PDF) - Autonomous networks supporting tomorrow's ICT. Addresses telecom-IT convergence.
- 3GPP SON - Self-Organizing Networks: self-configuration, self-optimization, self-healing.
- ITU-T Y.3172 - Architectural framework of machine learning in future networks.
- ITU-T Y.3060 - AN Trust - Trust principles (accountability, explainability, safety) in autonomous networks.
- ITU-T Y.3061 - AN Architecture - Architecture framework with Autonomy Engine.
- O-RAN Alliance - Open, intelligent, virtualized, interoperable mobile networks.
- GSMA AI for Networks - Industry-wide AI/ML program for network operations.
- GSMA Agentic AI in Telecom - Multi-agent architectures for autonomous networks.
- AI for 5G/6G Survey (MDPI 2025) - Taxonomy of AI applications, trends, and challenges in 5G/6G.
- ZSM for 5G and Beyond Survey - Comprehensive survey on ZSM architecture and implementation.
- AI-Driven ZSM in 5G (IEEE) - Challenges: data quality, model interpretability, safety constraints.
- ML for O-RAN Automation (MDPI) - ML applications for RIC-based RAN optimization.
- Open-Source 5G Core Evaluation - Performance comparison of Open5GS, free5GC, and OpenAirInterface.
- TelecomGPT (IEEE) - Framework to build telecom-specific LLMs with novel benchmarks.
- Network Standardization (ITU) - Network automation architecture standardization perspective.
⭐ = autonomously-io program component
Contributions welcome! Please read the contribution guidelines first.