FixOnFail - Hackathon Submission

Inspiration

Production errors are inevitable, but fixing them shouldn't require constant developer intervention. We built FixOnFail to create a truly autonomous system that detects, analyzes, and fixes production errors automatically—like having an AI DevOps engineer working 24/7.

What it does

FixOnFail is an autonomous error resolution system that:

  • Monitors Sentry every 5 seconds for new production errors
  • Analyzes errors using Claude Code in isolated Daytona sandboxes
  • Fixes bugs automatically with AI-powered code changes
  • Tests fixes locally before deployment
  • Deploys via GitHub CI/CD to Vercel
  • Verifies successful deployment and continues monitoring

The entire loop runs autonomously—from error detection to production fix, without human intervention.

How we built it

Tech Stack:

  • Sentry - Error monitoring and detection
  • Daytona - Isolated sandbox environments for safe testing
  • Claude Code - AI-powered code analysis and fixing
  • GitHub - Version control and CI/CD triggers
  • Vercel - Automatic deployments
  • Python Orchestrator - Coordinates all services

Architecture:

  1. Python orchestrator polls Sentry API for unresolved issues
  2. Creates Daytona sandbox and clones repository
  3. Uses Claude Code to analyze error, identify root cause, and generate fix
  4. Tests fix with npm run build in sandbox
  5. Commits and pushes fix to GitHub
  6. Monitors Vercel deployment until successful
  7. Continues loop, fixing new errors as they appear

Challenges we ran into

  • PTY Session Management: Initially struggled with hanging npm installs—solved with timeouts and proper session handling
  • API Integration: Coordinating multiple services (Sentry, Daytona, Claude, GitHub, Vercel) required careful error handling
  • Real-time Output: Needed to stream all sandbox operations for visibility—implemented streaming PTY output
  • Sandbox Cleanup: Disk space limits required automatic cleanup of old sandboxes
  • TypeScript Compilation: Bugs needed to compile but fail at runtime for realistic testing

Accomplishments that we're proud of

Fully Autonomous Loop: Complete end-to-end automation from error detection to production deployment
Real-time Visibility: All sandbox operations stream live output for transparency
Safe Testing: All fixes tested in isolated Daytona sandboxes before production
Self-Healing: System continues monitoring and fixing until no errors remain
Production Ready: Successfully fixed real Sentry errors and deployed to Vercel

What we learned

  • Orchestration Complexity: Coordinating multiple APIs and services requires robust error handling and timeouts
  • Sandbox Isolation: Daytona provides perfect environment for safe AI code execution
  • AI Code Quality: Claude Code can analyze stack traces and generate production-ready fixes
  • Real-time Monitoring: Streaming output is crucial for debugging autonomous systems
  • Incremental Fixes: System works best when fixing one issue at a time with verification

What's next for FixOnFail

  • Multi-language Support: Extend beyond Next.js to Python, Go, Rust
  • Intelligent Prioritization: Fix critical errors first, batch similar issues
  • Human-in-the-loop: Optional approval workflow for sensitive changes
  • Learning System: Track which fixes work best and improve over time
  • Cost Optimization: Better sandbox reuse and caching to reduce API costs
  • Team Integration: Slack/Discord notifications and team dashboards

Built With

  • claude-code
  • daytona
  • python
  • sandbox
  • sentry
Share this project:

Updates