Skip to content

aip-hack/aip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AIP - AI Interface Producer

Real-time voice AI agent that helps people solve daily problems through generating UI's through an agentic voice interface with the help of our custom llm leveraging our proprietary Agent-to-Interface-Protocol (AIP) aimed at reducing the token bloating.

Live demo link: https://aip-neon.vercel.app

๐ŸŽฏ Project Overview

Core Feature: Talk to an AI agent that generates in real-time as you describe them. The generated UI appears instantly on screen while maintaining persistent voice interaction through an elegant Aura overlay.

Demo Flow:

  1. Connect to voice agent
  2. Say "Create a user profile card with avatar and bio"
  3. Agent generates and displays the interface instantly
  4. Continue voice interaction with mini Aura overlay
  5. Agent can end session gracefully via voice command

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   React Frontend โ”‚    โ”‚  LiveKit Cloud   โ”‚    โ”‚  Voice Agent    โ”‚
โ”‚                 โ”‚โ—„โ”€โ”€โ–บโ”‚                  โ”‚โ—„โ”€โ”€โ–บโ”‚                 โ”‚
โ”‚ โ€ข Aura Componentโ”‚    โ”‚ โ€ข WebRTC         โ”‚    โ”‚ โ€ข GPT-4 + Tools โ”‚
โ”‚ โ€ข UI Rendering  โ”‚    โ”‚ โ€ข Text Streams   โ”‚    โ”‚ โ€ข ElevenLabs TTSโ”‚
โ”‚ โ€ข Session Mgmt  โ”‚    โ”‚ โ€ข Authentication โ”‚    โ”‚ โ€ข OpenAI STT    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Tech Stack

Backend Agent (Deployed to LiveKit Cloud):

  • LiveKit Agents SDK v1.4.2 - Voice agent framework
  • OpenAI GPT-4.1 - LLM + gpt-4o-transcribe STT
  • ElevenLabs eleven_flash_v2_5 - TTS
  • Silero VAD + MultilingualModel - Turn detection
  • Python 3.12 via Anaconda

Frontend:

  • React 19.2 + Vite 7.3.1 - UI framework
  • LiveKit Components - WebRTC client + official AgentAudioVisualizerAura
  • Tailwind CSS v4 + shadcn - Styling
  • Shadow DOM - CSS scoping for generated UI

Infrastructure:

  • LiveKit Cloud - Agent hosting + WebRTC infrastructure
  • Project: aip (wss://aip-go1n19vl.livekit.cloud)
  • Agent ID: CA_dJ9gqgtu9hJB (EU West B region)

๐Ÿš€ Quick Start

Prerequisites

# Backend
python 3.12+ (Anaconda recommended)
lk CLI (npm install -g @livekit/cli)

# Frontend  
node 18+

1. Clone & Setup

git clone <repo>
cd aip

# Backend setup
cd backend
pip install -r requirements.txt
cp .env.example .env  # Add your API keys

# Frontend setup
cd ../frontend
npm install
cp .env.example .env  # Add VITE_SANDBOX_ID

2. API Keys Required

Backend (.env):

OPENAI_API_KEY=sk-...
ELEVEN_API_KEY=sk_...
LIVEKIT_URL=wss://aip-go1n19vl.livekit.cloud
LIVEKIT_API_KEY=APIgiR8q69idSmU
LIVEKIT_API_SECRET=...

Frontend (.env):

VITE_SANDBOX_ID=aiphack-10p40t

3. Development

Option A: Use Deployed Agent (Recommended)

# Frontend only
cd frontend
npm run dev  # โ†’ http://localhost:5173

Option B: Local Development

# Terminal 1: Local agent
cd backend
python agent.py console  # or `python agent.py dev` for cloud mode

# Terminal 2: Frontend
cd frontend  
npm run dev

4. Production Deployment

Agent (Already deployed):

cd backend
lk agent create --secrets-file .env
# โœ… Deployed to LiveKit Cloud automatically

Frontend (Deploy to Vercel):

cd frontend
npm run build
# Deploy dist/ to Vercel (no env vars needed - sandbox ID hardcoded)

๐ŸŽฎ Usage

Voice Commands

  • "Create a [description]" โ†’ Generates UI component
  • "Make it [modification]" โ†’ Modifies existing UI (planned)
  • "Show me the components" โ†’ Lists available components (planned)
  • "Goodbye" / "End session" โ†’ Agent gracefully disconnects

UI States

  1. Pre-connect: Large Aura with "Tap to connect"
  2. Connected: Generated UI with small Aura overlay (bottom-left)
  3. Disconnected: Clickable reconnect pill with pulse animation

Current Agent Tools

  • โœ… generate_ui() - Creates HTML/CSS interface (currently hardcoded)
  • โœ… end_session() - Graceful disconnect with drain โ†’ shutdown โ†’ sleep โ†’ disconnect
  • ๐Ÿ”„ modify_ui() - Edit existing interface (placeholder)
  • ๐Ÿ”„ list_components() - Show available components (placeholder)

๐Ÿ”ง Development Notes

Agent Logs

lk agent logs                      # Runtime logs
lk agent logs --log-type=build     # Build logs

Key Implementation Details

  • Shadow DOM: Generated UI uses Shadow DOM for CSS scoping
  • Text Streams: Agent sends UI via room.local_participant.send_text()
  • Event Handling: Frontend uses participantDisconnected + disconnected for reliable disconnect detection
  • Session Management: session.end() called on disconnect for clean reconnection
  • Graceful Shutdown: Agent uses await session.drain() โ†’ session.shutdown() โ†’ room.disconnect()

Turn Detection Tuning

MultilingualModel(
    unlikely_threshold=0.3,      # Lower = more sensitive
    min_endpointing_delay=0.35,  # Faster response  
    max_endpointing_delay=2.0    # Max wait time
)

๐Ÿ“‹ Roadmap

Phase 1: Hackathon MVP โœ…

  • Voice agent with tool calling
  • Official Aura component integration
  • Real-time UI generation display
  • Agent-controlled session ending
  • Reconnectable sessions
  • Deploy to LiveKit Cloud

Phase 2: Enhanced Generation ๐Ÿ”„

  • LLM-powered UI generation (replace hardcoded HTML)
  • Component library integration
  • Interactive modifications via voice
  • Component export/save functionality

Phase 3: Production ๐Ÿš€

  • Custom token endpoint (replace sandbox)
  • User authentication & sessions
  • Component persistence
  • Multi-user collaboration

๐Ÿ› ๏ธ Troubleshooting

Common Issues

  1. "Connecting" stuck: Check agent deployment status with lk agent logs
  2. No audio: Verify microphone permissions in browser
  3. Build fails: Ensure all API keys are set in .env files
  4. Agent not responding: Check OpenAI + ElevenLabs API key limits

Authentication

  • Development: Uses sandbox token server (current setup)
  • Production: Requires custom JWT token endpoint

๐Ÿ† ParisHack 2026

Built for ParisHack hackathon with focus on rapid prototyping and user experience. The project demonstrates real-time voice-to-UI generation with persistent voice interaction capabilities.

Team: Solo project by @diniskakov Demo: Live voice agent at wss://aip-go1n19vl.livekit.cloud

Releases

No releases published

Packages

 
 
 

Contributors