Voice Arena - Hackathon Submission

Inspiration

AI voice agents are increasingly vulnerable to security attacks, prompt injection, social engineering, and credential leaks happen in production with no visibility. Traditional testing is manual, slow, and blind to root causes. We built Voice Arena to solve this: autonomous security testing with AI-powered self-healing.

The inspiration came from seeing real-world AI agents fail silently in production. We wanted a system that could not just detect vulnerabilities, but automatically fix them, creating a self-improving security loop powered by GPT-4o and complete observability through Sentry.

What it does

Voice Arena is an autonomous AI security testing and self-repair system for voice agents. It provides two powerful modes:

Self-Healing Mode:

  • Tests voice agents with adversarial inputs
  • Detects security leaks, repetition loops, and policy violations
  • Uses GPT-4o to analyze failures and generate improved prompts
  • Automatically re-tests until the agent is secure (or max iterations reached)
  • Complete Sentry integration for full observability

Red Team Mode:

  • GPT-4o generates sophisticated attack strategies across 7 categories (security leak, social engineering, jailbreak, prompt injection, etc.)
  • Tests agents with AI-generated attacks
  • Automatically generates defenses when vulnerabilities are found
  • Comprehensive vulnerability scanning with one click
  • Tracks vulnerability reduction percentage across healing rounds

Key Features:

  • Zero human intervention—fully autonomous testing and fixing
  • Real-time WebSocket updates showing each iteration
  • Sentry AI Agent Monitoring integration for complete traceability
  • Mock mode for free development and demos
  • Production-ready with FastAPI backend and Next.js frontend

How we built it

Architecture:

  • Backend (FastAPI): REST API + WebSocket for real-time updates
  • Frontend (Next.js 16): React dashboard with Tailwind CSS and Framer Motion
  • Core Components:
    • healer.py: Orchestrates the self-healing loop
    • elevenlabs_client.py: Voice agent testing and failure detection
    • openai_fixer.py: GPT-4o-powered fix generation
    • red_team_attacker.py: AI attack generation with 7 categories
    • sentry_api.py: Context fetching from Sentry for informed fixes

The Self-Healing Loop:

Test Agent → Detect Failures → GPT-4o Analyzes → Generate Fix → Re-Test → Loop

Red Team Flow:

GPT-4o Generates Attack → Test Agent → Analyze Response → If Vulnerable → Generate Defense → Verify

Tech Stack:

  • Backend: Python 3.10+, FastAPI, async/await
  • Frontend: Next.js 16, React, TypeScript, Tailwind CSS 4
  • AI: OpenAI GPT-4o (fix generation & attack generation)
  • Voice: ElevenLabs API (conversational AI testing)
  • Monitoring: Sentry AI Agent Monitoring (tracing, error capture, context)
  • Sandbox: Daytona (optional isolation for secure testing)

Development Process:

  1. Built core components independently (ElevenLabs client, OpenAI fixer, Sentry integration)
  2. Implemented self-healing orchestrator with iteration tracking
  3. Added Red Team mode with adaptive attack generation
  4. Created Next.js dashboard with real-time WebSocket updates
  5. Integrated Sentry for complete observability and context-aware fixes

Challenges we ran into

1. Failure Detection Accuracy

  • Problem: Distinguishing between legitimate responses and security leaks
  • Solution: Built multi-layered detection with pattern matching, keyword analysis, and context-aware rules

2. GPT-4o Fix Quality

  • Problem: Initial fixes were too generic or broke agent functionality
  • Solution: Enhanced prompts with failure context from Sentry, previous iteration history, and specific vulnerability types

3. Real-Time Updates

  • Problem: Frontend needed live iteration results without polling
  • Solution: Implemented WebSocket connections with session management for real-time streaming

4. Red Team Attack Diversity

  • Problem: GPT-4o generating similar attacks across categories
  • Solution: Added attack history tracking, category-specific prompts, and confidence scoring to ensure diverse, creative attacks

5. Sentry Integration Complexity

  • Problem: Fetching relevant context from Sentry for fix generation
  • Solution: Built context-aware API fetcher that retrieves issue details, traces, and error context to inform GPT-4o fixes

6. Mock Mode Realism

  • Problem: Mock responses needed to be realistic for development
  • Solution: Created sophisticated mock system that simulates real agent behaviors, failures, and responses

Accomplishments that we're proud of

Fully Autonomous System - Zero human intervention from test to fix to verification

AI-Powered Red Team - GPT-4o generates creative, adaptive attacks across 7 categories

Complete Observability - Sentry integration provides full traceability of every iteration, failure, and fix

Fast Healing - Average fix time under 4 seconds for common vulnerabilities

Production-Ready Architecture - Clean separation of concerns, async/await throughout, comprehensive error handling

Beautiful UI - Modern Next.js dashboard with real-time updates, smooth animations, and intuitive UX

What we learned

Technical Learnings:

  • GPT-4o is incredibly effective at both generating attacks and creating targeted fixes when given proper context
  • Sentry's AI Agent Monitoring provides invaluable context that dramatically improves fix quality
  • WebSocket real-time updates create a much better UX than polling
  • Mock mode is essential for rapid development and demos without API costs

Architecture Insights:

  • Separating components (testing, detection, fixing) makes the system more maintainable
  • Async/await throughout the stack enables true real-time updates
  • Session management is crucial for tracking multi-iteration healing processes

AI Security Insights:

  • Voice agents are vulnerable to many attack vectors beyond traditional prompt injection
  • Social engineering attacks are particularly effective against voice agents
  • Automatic defense generation is possible and effective with proper context

Development Process:

  • Building independent components first, then orchestrating them, leads to cleaner code
  • Real-time feedback loops (test → fix → test) are powerful for both development and production

What's next for Voice Arena

Short Term:

  • Multi-Agent Testing: Test multiple agents simultaneously and compare security postures
  • Custom Attack Libraries: Allow users to define custom attack scenarios and categories
  • Performance Metrics: Track response times, token usage, and cost optimization
  • Export/Import: Save and share secure prompts, attack patterns, and healing sessions

Medium Term:

  • CI/CD Integration: Automatically test and heal agents in deployment pipelines
  • Advanced Analytics: Dashboard showing vulnerability trends, fix effectiveness, and attack patterns
  • Sandbox Execution: Full Daytona integration for isolated, secure testing environments
  • Multi-Language Support: Test agents in multiple languages and locales

Long Term:

  • Federated Learning: Learn from vulnerabilities across all users to improve detection and fixes
  • Agent Marketplace: Share secure, tested agent prompts with the community
  • Enterprise Features: Team collaboration, role-based access, audit logs
  • Real-Time Monitoring: Continuous production monitoring with automatic healing on new vulnerabilities

Vision: Voice Arena becomes the standard for AI agent security testing where every voice agent is tested, attacked, and healed before production, with complete observability and zero manual intervention.

🎥 Video demo

Demo Video Link

Built With

Share this project:

Updates