Enterprise-Grade AI Infrastructure Observability & Chaos Engineering
NeurOps is a production-grade platform designed to bridge the gap between raw hardware telemetry and intelligent operational response. While this repository includes Redfish Emulation for rapid validation and chaos testing, the core architecture is engineered to scale across enterprise data centers, managing real-world hardware telemetry via standard Redfish APIs.
NeurOps is built for more than just simulation. Every component - from the Neurosight Collector to the NeuroTalk AI Agent - is designed to consume real hardware data. To move from the default validation environment to a production workload, simply swap the Redfish URLs in your config.yaml with your actual BMC (Baseboard Management Controller) endpoints.
The Auto-Healing API endpoints provided in the Chaos Proxy are currently simulated recovery actions. However, these are designed as standardized hooks for production workloads. Engineers can tie their own automation logic (e.g., Ansible playbooks, Jenkins jobs, or direct Redfish Reset commands) to these endpoints, enabling the AI to trigger real remediation actions in a production data center.
- ⚡ Agentic AI Assistant: A specialized Google ADK Agent that understands infrastructure, analyzes BigQuery telemetry, and identifies root causes in natural language.
- 🌀 Chaos Management: A dynamic proxy layer for real-time fault injection (thermal spikes, resource leaks, disk failures).
- 📡 Redfish Emulation: Scalable simulation of data center hardware using standard RESTful APIs.
- 🚨 Intelligent Telemetry: Built-in anomaly and trend detection that warns you of failures before they cross critical thresholds.
- 📊 Unified Dashboard: A premium Streamlit UI providing a single pane of glass for telemetry and AI interaction.
graph LR
subgraph Simulation
Docker[Redfish Simulators] --> Proxy[Chaos Proxy]
end
subgraph Analysis
Proxy --> Sight[Neurosight Collector]
Sight --> Cloud[Google Cloud Stats]
end
subgraph Brain
Cloud --> BQ[(BigQuery)]
BQ --> Agent[NeuroTalk AI]
Proxy --> Agent
end
Agent --> UI[Human Operator UI]
Want to understand the architecture deeply? ➡️ See doc/02-architecture-overview.md
- Authenticate with Google:
gcloud auth application-default login - Set up a virtual environment:
python3 -m venv mylab && source mylab/bin/activate && pip install -r requirements.txt
NeurOps is fully orchestrated. Start everything with one command:
make startneuropsThis starts the simulators, proxy, intelligence collector, and chat UI.
Open http://localhost:8501 to start chatting with NeuroTalk.
| Section | Description |
|---|---|
| 🚀 Introduction | Mission, value, and high-level overview. |
| 🏗️ Architecture | Technical design and tech stack details. |
| 🌊 Project Flow | The journey of a metric (Step-by-step lifecycle). |
| 🛠️ Setup Guide | detailed onboarding and environmental cleanup. |
| 📖 API Reference | Chaos Proxy and NeuroTalk API specs. |
| 🔍 Troubleshooting | Common errors, logs, and diagnostic commands. |
- Backend: Python 3.12, FastAPI, Uvicorn.
- AI/LLM: Google Gemini 3 Flash, Google Agent Development Kit (ADK).
- Data: Google Cloud Pub/Sub, BigQuery.
- Frontend: Streamlit.
- Infra: Docker Compose, Redfish (Sushy).
Ready to help? See our Contribution Guide for coding standards and development workflows.
Last Updated: April 2026 | Environment: mylab
