Tapaso: Real-Time PR Verification with Isolated Sandboxes
The Problem We Solved
Broken code slips past static analysis every day because linters can't tell you if your app actually runs. Developers waste hours debugging "works on my machine" bugs that should have been caught before merge. The gap between "code is syntactically correct" and "code actually works" costs teams 20% of their engineering time.
Our Solution
Tapaso is a PR verification engine that doesn't just analyze code—it executes it. For every Pull Request, Tapaso:
- Spins up an isolated, ephemeral sandbox using Daytona in sub-100ms
- Clones the PR branch and installs real dependencies
- Builds and runs the application in a fresh OS environment
- Captures runtime errors and logs through Sentry
- Performs static code quality analysis with CodeRabbit
- Generates a unified Confidence Score (0-100) that tells developers: "This code is truly production-ready"
What We Built
A full-stack web application with:
- Frontend (React): GitHub repo input → branch selector → real-time dashboard
- Backend (Node.js): API routes orchestrating Daytona sandbox creation, log streaming, error aggregation
- Integrations:
- Daytona for ephemeral execution environments
- GitHub API for repo/branch metadata
- Sentry for error capture and monitoring
- CodeRabbit for AI-powered code review
The user experience is simple: paste a GitHub repo URL, select a branch, and watch in real-time as the system provisions a sandbox, executes code, and reports back a comprehensive quality verdict in seconds.
What We Learned
Daytona's Speed is a Game-Changer: Sub-100ms sandbox provisioning enables new architectural patterns that were impossible with traditional VMs. This unlocked the ability to spin up per-PR verification in seconds rather than minutes.
Error Aggregation is Critical: Combining runtime errors (Sentry), static analysis (CodeRabbit), and test results creates a much more useful signal than any single tool. A "Confidence Score" that incorporates all three dimensions is more actionable than individual metrics.
Real Execution Beats Pattern Matching: Static analysis tools are excellent but incomplete. By actually running the code, we catch edge cases, dependency conflicts, and configuration drift that no linter could detect.
Rate Limiting and API Orchestration are Non-Trivial: Managing three external APIs (GitHub, CodeRabbit, Sentry) with different rate limits, response times, and error modes required careful orchestration logic.
Challenges We Faced
GitHub OAuth vs. Public Repo Checking
Challenge: We initially planned full OAuth authentication, but that's complex for a hackathon.
Solution: We pivoted to a simpler public-repo-only flow. Users paste a repo URL, we validate it's public via GitHub API, and proceed. This is 80% of the use case and 20% of the complexity.Sentry Integration Complexity
Challenge: Sentry's API has multiple ways to query errors (events, issues, releases). Deciding which to use and how to parse results was non-obvious.
Solution: We focused on parsing runtime errors directly from Daytona's execution output first, then optionally augmented with Sentry API calls for richer context.Real-Time Log Streaming
Challenge: Showing live logs as the sandbox executes requires WebSocket or Server-Sent Events. Getting timing right (lag, buffering) was tricky.
Solution: Implemented Server-Sent Events (SSE) with proper buffering and color-coding for different log levels.Handling Multiple Tech Stacks
Challenge: A Node.js repo runs withnpm test, Python withpytest, Go withgo test. How do we auto-detect?
Solution: Simple heuristic: check forpackage.json→ Node,requirements.txt→ Python,go.mod→ Go, etc. Works for 90% of cases.
Why This Matters
Teams deploying AI agents, microservices, and distributed systems are desperate for better PR verification. Staging environments are expensive and fragile. Tapaso replaces them with disposable, per-PR sandboxes that guarantee a clean slate—and costs cents instead of dollars.
The three-tool integration (Daytona + Sentry + CodeRabbit) creates a moat: each tool is powerful, but together they form a unified "green light" that developers trust.
This is the future of CI/CD: not just "did the code pass linting?" but "did the code actually run successfully in a production-like environment?"
Next Steps (Post-Hackathon)
- GitHub Actions integration (automatically trigger on PR creation)
- Slack/Teams notifications with inline quality scores
- Historical trend analysis (is code quality improving?)
- Cost tracking (how much does each PR verification cost?)
- Support for private repositories (OAuth + Enterprise repos)
Log in or sign up for Devpost to join the conversation.