Production-grade voice AI agent console built for Freya (YC S25). Real-time voice conversation system with LiveKit streaming.
2025-10-13.23-27-08.mov
- Voice-to-Voice AI: Groq Whisper STT → llama-3.1-8b → Cartesia TTS
- Real-time Streaming: LiveKit-powered bidirectional audio
- Prompt Management: CRUD operations with versioning
- Session Analytics: Metrics and conversation history
- Production Ready: Dockerized deployment
- Next.js 15 (App Router)
- TypeScript
- Tailwind CSS
- LiveKit Client SDK
- Python 3.11
- LiveKit Agents
- Groq (LLM + STT)
- Cartesia (TTS)
- Silero VAD
- Docker Compose
- Multi-stage builds
- Health checks
- Docker & Docker Compose
- LiveKit server URL + credentials
- Groq API key
- Cartesia API key
- Clone and configure
git clone <your-repo>
cd <project-folder>
cp .env.example .env
- Add your API keys to .env
LIVEKIT_URL=wss://your-server.livekit.cloud
LIVEKIT_API_KEY=your_key
LIVEKIT_API_SECRET=your_secret
GROQ_API_KEY=your_groq_key
CARTESIA_API_KEY=your_cartesia_key
- Build and run
docker compose up --build
- Access the app
http://localhost:3000
agent/
├── main.py # Entry point, LiveKit worker
├── requirements.txt # Python dependencies
└── agent/
├── config.py # Environment config (LLM model, etc)
├── voice_agent.py # Main agent class, LiveKit setup
└── conversation.py # Conversation handler (STT → LLM → TTS)
web/
├── app/
│ ├── page.tsx # Landing/login page
│ ├── api/
│ │ ├── auth/ # Login, logout, check endpoints
│ │ ├── livekit/token/ # LiveKit token generation
│ │ └── prompts/ # Prompt CRUD API
│ └── console/
│ ├── page.tsx # Main dashboard
│ ├── Components/
│ │ ├── ChatPanel.tsx # TEXT CHAT (needs work)
│ │ ├── MetricsPanel.tsx # Metrics display
│ │ ├── PromptModal.tsx # Create/edit prompts
│ │ └── PromptSidebar.tsx # Prompt list
│ └── hooks/
│ ├── useLiveKit.ts # LiveKit connection logic
│ └── usePrompts.ts # Prompt management
└── lib/
├── store.ts # State management
└── utils.ts # Utilities
User Browser
↓
Next.js Frontend (Port 3000)
↓
LiveKit Cloud
↓
Python Agent
↓
Groq API (LLM/STT) + Cartesia (TTS)
- STT: Groq Whisper for speech-to-text
- LLM: Groq llama-3.1-8b-instant for responses
- TTS: Cartesia Sonic English voice
- VAD: Silero for voice activity detection
- Create/Read/Update/Delete prompts
- Version history tracking
- In-memory storage (PostgreSQL ready)
- Real-time session tracking
- Conversation history
- Performance metrics
- Analytics dashboard
Backend:
cd agent
pip install -r requirements.txt
python main.py dev
Frontend:
cd web
npm install
npm run dev
# Start services
docker compose up --build
# Run in background
docker compose up --build -d
# View logs
docker compose logs -f
# Stop services
docker compose down
# Fresh start (remove volumes)
docker compose down -v
Focused on core value proposition (voice AI) over text chat to maximize impact in limited timeframe.
Rapid prototyping. Production would use PostgreSQL with Prisma ORM.
Industry standard for real-time voice/video. Powers ChatGPT Advanced Voice Mode.
Smaller images, faster builds, better security with non-root users.
Future enhancements for production deployment:
- PostgreSQL for persistence
- Redis for session caching
- Rate limiting and authentication
- Comprehensive test suite
- Session recording and playback
- Multi-tenant support
- Monitoring and logging (OpenTelemetry)
- Load balancing and scaling
- CI/CD pipeline
Built as Forward Deploying Engineer Assessment for Freya (YC S25)
- Timeframe: 3 days (10/10/25 - 13/10/25)
- Scope: 80% complete (voice working, text chat deprioritized)
- Focus: Production-grade voice AI implementation
- Prioritization: Voice-to-voice pipeline over text chat features
- Real-time bidirectional audio streaming
- Low-latency voice conversation
- Prompt versioning system
- Session analytics and metrics
- Dockerized for portability
- Multi-stage builds for optimization
- Non-root container users for security
- LiveKit: https://livekit.io
- Groq: https://groq.com
- Cartesia: https://cartesia.ai
- Freya (YC S25): https://www.ycombinator.com/companies/freya
This project is licensed under a Portfolio Display License.
TL;DR:
- ✅ You can view and study this code
- ✅ You can reference it in discussions about employment
- ❌ You cannot use this commercially
- ❌ You cannot redistribute or sell this
See LICENSE file for details.
For commercial licensing inquiries: [email protected]