Skip to main content

Welcome to IncidentFox

IncidentFox is an AI SRE / AI On-Call engineer that integrates with your observability stack, infrastructure, and collaboration tools to automatically investigate incidents, find root causes, and suggest fixes.

Quick Start

Get IncidentFox up and running in minutes

How It Works

Understand the multi-agent architecture

Configuration

Configure agents, tools, and prompts

Integrations

Connect to Slack, GitHub, PagerDuty, and more

Key Features

IncidentFox uses two powerful agent runtimes:
  • OpenAI SDK Agent - Production automation with multi-agent orchestration (Planner + Specialists)
  • Claude SDK SRE Agent - Interactive debugging with Kubernetes sandbox isolation
Specialized agents include K8s, AWS, Metrics, Coding, and Investigation agents working together.
Pre-built integrations across 20+ categories:
  • Kubernetes: Pod logs, events, deployments, resource usage (9 tools)
  • AWS: EC2, Lambda, RDS, ECS, CloudWatch (8+ tools)
  • Observability: Grafana, Datadog, Prometheus, Coralogix, Sentry, New Relic (15+ tools)
  • Log Analysis: Statistics, sampling, pattern search, anomaly detection (7 tools)
  • Docker: Container logs, stats, exec, events (15 tools)
  • GitHub: Code search, PRs, issues, Actions, commits (16 tools)
  • Database: MySQL, PostgreSQL, Snowflake, BigQuery (70+ tools)
  • And more: PagerDuty, Slack, Linear, Jira, Confluence, Terraform…
Hierarchical knowledge retrieval system based on ICLR 2024 research:
  • Handles 100+ page runbooks without context loss
  • Knowledge graphs for service dependencies and ownership
  • Learns from past investigations to improve over time
  • Multi-level abstraction: procedural, factual, temporal, policy
Invoke IncidentFox from wherever your team works:
  • Slack - Mention the bot in any channel
  • GitHub - Comment on issues or PRs
  • PagerDuty - Automatic investigation on alerts
  • Incident.io - Integrated incident response
  • REST API - Programmatic access
  • Web UI - Dashboard for investigations and configuration
Intelligent analysis powered by state-of-the-art ML:
  • Anomaly Detection - Z-score, Prophet-based seasonal detection
  • Forecasting - Capacity planning with uncertainty bounds
  • Correlation Analysis - Cross-service metric relationships
  • Change Point Detection - Identify when issues started
  • Pattern Learning - Records and reuses incident patterns
Built for enterprise security and compliance:
  • SOC 2 compliant infrastructure
  • Claude Sandbox isolation with Kubernetes + gVisor
  • Credentials proxy (Envoy) - secrets never touch agent
  • SSO/OIDC authentication (Google, Azure AD, Okta)
  • Approval workflows for critical changes
  • Full audit logging
  • On-premise and air-gapped deployment options

What Can IncidentFox Do?

Incident Investigation

When an incident occurs, IncidentFox automatically:
  1. Gathers Context - Pulls logs, metrics, and recent changes from your observability stack
  2. Analyzes Root Cause - Correlates data across services to identify the issue
  3. Provides Timeline - Reconstructs what happened and when
  4. Suggests Fixes - Recommends actionable remediation steps
@incidentfox investigate why the payments service is slow

CI/CD Auto-Fix

When your CI pipeline fails, IncidentFox can:
  1. Detect Failures - Monitors GitHub Actions, CodePipeline, and other CI systems
  2. Analyze Logs - Reads test output and build errors
  3. Identify Root Cause - Correlates failures with code changes in the PR
  4. Propose Fixes - Suggests code changes to resolve the issue

Proactive Monitoring

IncidentFox can monitor your systems and alert before issues escalate:
  • Anomaly Detection - Prophet-based forecasting identifies unusual patterns
  • Correlation Analysis - Links metrics across services to find relationships
  • Knowledge Base - RAPTOR hierarchical retrieval learns from runbooks and past incidents
  • Alert Correlation - Connects Prometheus, Alertmanager, and PagerDuty alerts

Getting Started

1

Connect Your Data Sources

Configure connections to your observability stack (Coralogix, Datadog, Grafana, etc.)
2

Set Up Integrations

Connect IncidentFox to Slack, GitHub, or PagerDuty for triggering investigations
3

Configure Your Team

Customize agent prompts and enable/disable tools for your specific needs
4

Start Investigating

Mention @incidentfox in Slack or trigger via your preferred integration

Support

Need help? Contact us at [email protected]