Voice Arena - Hackathon Submission

Inspiration

AI voice agents are increasingly vulnerable to security attacks, prompt injection, social engineering, and credential leaks happen in production with no visibility. Traditional testing is manual, slow, and blind to root causes. We built Voice Arena to solve this: autonomous security testing with AI-powered self-healing.

The inspiration came from seeing real-world AI agents fail silently in production. We wanted a system that could not just detect vulnerabilities, but automatically fix them, creating a self-improving security loop powered by GPT-4o and complete observability through Sentry.

What it does

Voice Arena is an autonomous AI security testing and self-repair system for voice agents. It provides two powerful modes:

Self-Healing Mode:

Tests voice agents with adversarial inputs
Detects security leaks, repetition loops, and policy violations
Uses GPT-4o to analyze failures and generate improved prompts
Automatically re-tests until the agent is secure (or max iterations reached)
Complete Sentry integration for full observability

Red Team Mode:

GPT-4o generates sophisticated attack strategies across 7 categories (security leak, social engineering, jailbreak, prompt injection, etc.)
Tests agents with AI-generated attacks
Automatically generates defenses when vulnerabilities are found
Comprehensive vulnerability scanning with one click
Tracks vulnerability reduction percentage across healing rounds

Key Features:

Zero human intervention—fully autonomous testing and fixing
Real-time WebSocket updates showing each iteration
Sentry AI Agent Monitoring integration for complete traceability
Mock mode for free development and demos
Production-ready with FastAPI backend and Next.js frontend

How we built it

Architecture:

Backend (FastAPI): REST API + WebSocket for real-time updates
Frontend (Next.js 16): React dashboard with Tailwind CSS and Framer Motion
Core Components:
- healer.py: Orchestrates the self-healing loop
- elevenlabs_client.py: Voice agent testing and failure detection
- openai_fixer.py: GPT-4o-powered fix generation
- red_team_attacker.py: AI attack generation with 7 categories
- sentry_api.py: Context fetching from Sentry for informed fixes

The Self-Healing Loop:

Test Agent → Detect Failures → GPT-4o Analyzes → Generate Fix → Re-Test → Loop

Red Team Flow:

GPT-4o Generates Attack → Test Agent → Analyze Response → If Vulnerable → Generate Defense → Verify

Tech Stack:

Backend: Python 3.10+, FastAPI, async/await
Frontend: Next.js 16, React, TypeScript, Tailwind CSS 4
AI: OpenAI GPT-4o (fix generation & attack generation)
Voice: ElevenLabs API (conversational AI testing)
Monitoring: Sentry AI Agent Monitoring (tracing, error capture, context)
Sandbox: Daytona (optional isolation for secure testing)

Development Process:

Built core components independently (ElevenLabs client, OpenAI fixer, Sentry integration)
Implemented self-healing orchestrator with iteration tracking
Added Red Team mode with adaptive attack generation
Created Next.js dashboard with real-time WebSocket updates
Integrated Sentry for complete observability and context-aware fixes

Challenges we ran into

1. Failure Detection Accuracy

Problem: Distinguishing between legitimate responses and security leaks
Solution: Built multi-layered detection with pattern matching, keyword analysis, and context-aware rules

2. GPT-4o Fix Quality

Problem: Initial fixes were too generic or broke agent functionality
Solution: Enhanced prompts with failure context from Sentry, previous iteration history, and specific vulnerability types

3. Real-Time Updates

Problem: Frontend needed live iteration results without polling
Solution: Implemented WebSocket connections with session management for real-time streaming

4. Red Team Attack Diversity

Problem: GPT-4o generating similar attacks across categories
Solution: Added attack history tracking, category-specific prompts, and confidence scoring to ensure diverse, creative attacks

5. Sentry Integration Complexity

Problem: Fetching relevant context from Sentry for fix generation
Solution: Built context-aware API fetcher that retrieves issue details, traces, and error context to inform GPT-4o fixes

6. Mock Mode Realism

Problem: Mock responses needed to be realistic for development
Solution: Created sophisticated mock system that simulates real agent behaviors, failures, and responses

Accomplishments that we're proud of

✅ Fully Autonomous System - Zero human intervention from test to fix to verification

✅ AI-Powered Red Team - GPT-4o generates creative, adaptive attacks across 7 categories

✅ Complete Observability - Sentry integration provides full traceability of every iteration, failure, and fix

✅ Fast Healing - Average fix time under 4 seconds for common vulnerabilities

✅ Production-Ready Architecture - Clean separation of concerns, async/await throughout, comprehensive error handling

✅ Beautiful UI - Modern Next.js dashboard with real-time updates, smooth animations, and intuitive UX

What we learned

Technical Learnings:

GPT-4o is incredibly effective at both generating attacks and creating targeted fixes when given proper context
Sentry's AI Agent Monitoring provides invaluable context that dramatically improves fix quality
WebSocket real-time updates create a much better UX than polling
Mock mode is essential for rapid development and demos without API costs

Architecture Insights:

Separating components (testing, detection, fixing) makes the system more maintainable
Async/await throughout the stack enables true real-time updates
Session management is crucial for tracking multi-iteration healing processes

AI Security Insights:

Voice agents are vulnerable to many attack vectors beyond traditional prompt injection
Social engineering attacks are particularly effective against voice agents
Automatic defense generation is possible and effective with proper context

Development Process:

Building independent components first, then orchestrating them, leads to cleaner code
Real-time feedback loops (test → fix → test) are powerful for both development and production

What's next for Voice Arena

Short Term:

Multi-Agent Testing: Test multiple agents simultaneously and compare security postures
Custom Attack Libraries: Allow users to define custom attack scenarios and categories
Performance Metrics: Track response times, token usage, and cost optimization
Export/Import: Save and share secure prompts, attack patterns, and healing sessions

Medium Term:

CI/CD Integration: Automatically test and heal agents in deployment pipelines
Advanced Analytics: Dashboard showing vulnerability trends, fix effectiveness, and attack patterns
Sandbox Execution: Full Daytona integration for isolated, secure testing environments
Multi-Language Support: Test agents in multiple languages and locales

Long Term:

Federated Learning: Learn from vulnerabilities across all users to improve detection and fixes
Agent Marketplace: Share secure, tested agent prompts with the community
Enterprise Features: Team collaboration, role-based access, audit logs
Real-Time Monitoring: Continuous production monitoring with automatic healing on new vulnerabilities

Vision: Voice Arena becomes the standard for AI agent security testing where every voice agent is tested, attacked, and healed before production, with complete observability and zero manual intervention.

🎥 Video demo

Demo Video Link

Built With

api
elevenlabs
fastapi
gpt-4o
next.js
openai
python
react
sentry
tailwind
typescript

Updates

Yi Zu started this project — Jan 24, 2026 06:56 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.