logo
architecture

FixOnFail - Hackathon Submission

Inspiration

Production errors are inevitable, but fixing them shouldn't require constant developer intervention. We built FixOnFail to create a truly autonomous system that detects, analyzes, and fixes production errors automatically—like having an AI DevOps engineer working 24/7.

What it does

FixOnFail is an autonomous error resolution system that:

Monitors Sentry every 5 seconds for new production errors
Analyzes errors using Claude Code in isolated Daytona sandboxes
Fixes bugs automatically with AI-powered code changes
Tests fixes locally before deployment
Deploys via GitHub CI/CD to Vercel
Verifies successful deployment and continues monitoring

The entire loop runs autonomously—from error detection to production fix, without human intervention.

How we built it

Tech Stack:

Sentry - Error monitoring and detection
Daytona - Isolated sandbox environments for safe testing
Claude Code - AI-powered code analysis and fixing
GitHub - Version control and CI/CD triggers
Vercel - Automatic deployments
Python Orchestrator - Coordinates all services

Architecture:

Python orchestrator polls Sentry API for unresolved issues
Creates Daytona sandbox and clones repository
Uses Claude Code to analyze error, identify root cause, and generate fix
Tests fix with npm run build in sandbox
Commits and pushes fix to GitHub
Monitors Vercel deployment until successful
Continues loop, fixing new errors as they appear

Challenges we ran into

PTY Session Management: Initially struggled with hanging npm installs—solved with timeouts and proper session handling
API Integration: Coordinating multiple services (Sentry, Daytona, Claude, GitHub, Vercel) required careful error handling
Real-time Output: Needed to stream all sandbox operations for visibility—implemented streaming PTY output
Sandbox Cleanup: Disk space limits required automatic cleanup of old sandboxes
TypeScript Compilation: Bugs needed to compile but fail at runtime for realistic testing

Accomplishments that we're proud of

✅ Fully Autonomous Loop: Complete end-to-end automation from error detection to production deployment
✅ Real-time Visibility: All sandbox operations stream live output for transparency
✅ Safe Testing: All fixes tested in isolated Daytona sandboxes before production
✅ Self-Healing: System continues monitoring and fixing until no errors remain
✅ Production Ready: Successfully fixed real Sentry errors and deployed to Vercel

What we learned

Orchestration Complexity: Coordinating multiple APIs and services requires robust error handling and timeouts
Sandbox Isolation: Daytona provides perfect environment for safe AI code execution
AI Code Quality: Claude Code can analyze stack traces and generate production-ready fixes
Real-time Monitoring: Streaming output is crucial for debugging autonomous systems
Incremental Fixes: System works best when fixing one issue at a time with verification

What's next for FixOnFail

Multi-language Support: Extend beyond Next.js to Python, Go, Rust
Intelligent Prioritization: Fix critical errors first, batch similar issues
Human-in-the-loop: Optional approval workflow for sensitive changes
Learning System: Track which fixes work best and improve over time
Cost Optimization: Better sandbox reuse and caching to reduce API costs
Team Integration: Slack/Discord notifications and team dashboards

Built With

claude-code
daytona
python
sandbox
sentry

Updates

Het Patel started this project — Jan 24, 2026 06:50 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.