Tapaso: Real-Time PR Verification with Isolated Sandboxes

The Problem We Solved

Broken code slips past static analysis every day because linters can't tell you if your app actually runs. Developers waste hours debugging "works on my machine" bugs that should have been caught before merge. The gap between "code is syntactically correct" and "code actually works" costs teams 20% of their engineering time.

Our Solution

Tapaso is a PR verification engine that doesn't just analyze code—it executes it. For every Pull Request, Tapaso:

Spins up an isolated, ephemeral sandbox using Daytona in sub-100ms
Clones the PR branch and installs real dependencies
Builds and runs the application in a fresh OS environment
Captures runtime errors and logs through Sentry
Performs static code quality analysis with CodeRabbit
Generates a unified Confidence Score (0-100) that tells developers: "This code is truly production-ready"

What We Built

A full-stack web application with:

Frontend (React): GitHub repo input → branch selector → real-time dashboard
Backend (Node.js): API routes orchestrating Daytona sandbox creation, log streaming, error aggregation
Integrations:
- Daytona for ephemeral execution environments
- GitHub API for repo/branch metadata
- Sentry for error capture and monitoring
- CodeRabbit for AI-powered code review

The user experience is simple: paste a GitHub repo URL, select a branch, and watch in real-time as the system provisions a sandbox, executes code, and reports back a comprehensive quality verdict in seconds.

What We Learned

Daytona's Speed is a Game-Changer: Sub-100ms sandbox provisioning enables new architectural patterns that were impossible with traditional VMs. This unlocked the ability to spin up per-PR verification in seconds rather than minutes.
Error Aggregation is Critical: Combining runtime errors (Sentry), static analysis (CodeRabbit), and test results creates a much more useful signal than any single tool. A "Confidence Score" that incorporates all three dimensions is more actionable than individual metrics.
Real Execution Beats Pattern Matching: Static analysis tools are excellent but incomplete. By actually running the code, we catch edge cases, dependency conflicts, and configuration drift that no linter could detect.
Rate Limiting and API Orchestration are Non-Trivial: Managing three external APIs (GitHub, CodeRabbit, Sentry) with different rate limits, response times, and error modes required careful orchestration logic.

Challenges We Faced

GitHub OAuth vs. Public Repo Checking
Challenge: We initially planned full OAuth authentication, but that's complex for a hackathon.
Solution: We pivoted to a simpler public-repo-only flow. Users paste a repo URL, we validate it's public via GitHub API, and proceed. This is 80% of the use case and 20% of the complexity.
Sentry Integration Complexity
Challenge: Sentry's API has multiple ways to query errors (events, issues, releases). Deciding which to use and how to parse results was non-obvious.
Solution: We focused on parsing runtime errors directly from Daytona's execution output first, then optionally augmented with Sentry API calls for richer context.
Real-Time Log Streaming
Challenge: Showing live logs as the sandbox executes requires WebSocket or Server-Sent Events. Getting timing right (lag, buffering) was tricky.
Solution: Implemented Server-Sent Events (SSE) with proper buffering and color-coding for different log levels.
Handling Multiple Tech Stacks
Challenge: A Node.js repo runs with npm test, Python with pytest, Go with go test. How do we auto-detect?
Solution: Simple heuristic: check for package.json → Node, requirements.txt → Python, go.mod → Go, etc. Works for 90% of cases.

Why This Matters

Teams deploying AI agents, microservices, and distributed systems are desperate for better PR verification. Staging environments are expensive and fragile. Tapaso replaces them with disposable, per-PR sandboxes that guarantee a clean slate—and costs cents instead of dollars.

The three-tool integration (Daytona + Sentry + CodeRabbit) creates a moat: each tool is powerful, but together they form a unified "green light" that developers trust.

This is the future of CI/CD: not just "did the code pass linting?" but "did the code actually run successfully in a production-like environment?"