FixOnFail - Hackathon Submission
Inspiration
Production errors are inevitable, but fixing them shouldn't require constant developer intervention. We built FixOnFail to create a truly autonomous system that detects, analyzes, and fixes production errors automatically—like having an AI DevOps engineer working 24/7.
What it does
FixOnFail is an autonomous error resolution system that:
- Monitors Sentry every 5 seconds for new production errors
- Analyzes errors using Claude Code in isolated Daytona sandboxes
- Fixes bugs automatically with AI-powered code changes
- Tests fixes locally before deployment
- Deploys via GitHub CI/CD to Vercel
- Verifies successful deployment and continues monitoring
The entire loop runs autonomously—from error detection to production fix, without human intervention.
How we built it
Tech Stack:
- Sentry - Error monitoring and detection
- Daytona - Isolated sandbox environments for safe testing
- Claude Code - AI-powered code analysis and fixing
- GitHub - Version control and CI/CD triggers
- Vercel - Automatic deployments
- Python Orchestrator - Coordinates all services
Architecture:
- Python orchestrator polls Sentry API for unresolved issues
- Creates Daytona sandbox and clones repository
- Uses Claude Code to analyze error, identify root cause, and generate fix
- Tests fix with
npm run buildin sandbox - Commits and pushes fix to GitHub
- Monitors Vercel deployment until successful
- Continues loop, fixing new errors as they appear
Challenges we ran into
- PTY Session Management: Initially struggled with hanging npm installs—solved with timeouts and proper session handling
- API Integration: Coordinating multiple services (Sentry, Daytona, Claude, GitHub, Vercel) required careful error handling
- Real-time Output: Needed to stream all sandbox operations for visibility—implemented streaming PTY output
- Sandbox Cleanup: Disk space limits required automatic cleanup of old sandboxes
- TypeScript Compilation: Bugs needed to compile but fail at runtime for realistic testing
Accomplishments that we're proud of
✅ Fully Autonomous Loop: Complete end-to-end automation from error detection to production deployment
✅ Real-time Visibility: All sandbox operations stream live output for transparency
✅ Safe Testing: All fixes tested in isolated Daytona sandboxes before production
✅ Self-Healing: System continues monitoring and fixing until no errors remain
✅ Production Ready: Successfully fixed real Sentry errors and deployed to Vercel
What we learned
- Orchestration Complexity: Coordinating multiple APIs and services requires robust error handling and timeouts
- Sandbox Isolation: Daytona provides perfect environment for safe AI code execution
- AI Code Quality: Claude Code can analyze stack traces and generate production-ready fixes
- Real-time Monitoring: Streaming output is crucial for debugging autonomous systems
- Incremental Fixes: System works best when fixing one issue at a time with verification
What's next for FixOnFail
- Multi-language Support: Extend beyond Next.js to Python, Go, Rust
- Intelligent Prioritization: Fix critical errors first, batch similar issues
- Human-in-the-loop: Optional approval workflow for sensitive changes
- Learning System: Track which fixes work best and improve over time
- Cost Optimization: Better sandbox reuse and caching to reduce API costs
- Team Integration: Slack/Discord notifications and team dashboards
Built With
- claude-code
- daytona
- python
- sandbox
- sentry
Log in or sign up for Devpost to join the conversation.