Rhetor — The AI Speaking Coach That Interrupts You Before Your Audience Does
Inspiration
Every founder, every speaker, every job candidate practices their pitch the same way — talking to a wall. No feedback, no interruptions, no coaching. By the time you watch the recording back, the moment is gone.
I've won 15+ hackathons and pitched at dozens of events. The single biggest difference between winning and losing a pitch isn't the idea — it's the delivery. How you breathe, how you pause, how you stand, whether you say "um" twelve times without noticing.
Professional speakers have coaches. Toastmasters has evaluators. TED speakers rehearse for months with feedback loops. But 99% of us? We practice alone and hope for the best.
Rhetor changes that. It's a real-time AI speaking coach powered by Gemini Live API that doesn't just listen — it interrupts you, coaches you, and helps you get better in the moment.
The name comes from ancient Greek — a rhetor was a teacher of rhetoric, the art of persuasive speaking. Rhetor brings that 2,500-year-old tradition into the AI age.
What It Does
Rhetor is a Duolingo for public speaking — combining structured exercises, real-time practice, and AI-powered post-session analysis into one cohesive experience.
1. Prepare
- Paste your speech text or upload a PDF of your slides — Gemini extracts structured talking points automatically
- AI Interview mode: Zeus (your AI coach) asks you Socratic questions to build your pitch from scratch
- Templates: Pre-built structures for elevator pitches, investor decks, conference talks, and job interviews
2. Warm Up
Before you speak, Rhetor guides you through exercises that professional speakers use every day:
- Box Breathing (4-4-4-4) with real-time breathing detection via microphone
- Voice warm-ups: Humming, lip trills, tongue twisters — with Gemini checking your articulation via live transcript
- Body preparation: Power pose, tension release, grounding — with optional camera posture check
3. Practice
The core experience — practice your pitch with:
- Auto-advancing teleprompter that tracks your speech in real-time and highlights the next bullet point when you've covered the current one
- Live metrics dashboard: real-time WPM, filler word counter, elapsed time
- AI coaching interruptions: Say "um"? Zeus cuts in: "Drop the filler!" Speaking too fast? "Slow down, breathe." Hit a key point well? "Strong!"
- Screen sharing: Share your actual slides — Gemini sees them and coaches you in context
4. Review & Replay
After practice, Rhetor records and analyzes your full session:
- Timestamped event timeline: See exactly when fillers, pace changes, and strong moments occurred
- AI deep analysis: Gemini watches the tape and extracts highlights, improvements, and key moments to rewind to — with specific rewrite suggestions
- "Review with Zeus" mode: Zeus walks you through your recording moment by moment, auto-seeking to key timestamps and coaching you through each one
5. Academy — Kill the Fillers
Interactive micro-lessons like "Kill the Fillers" — Zeus asks you random questions and you must answer without any filler words. Real-time detection buzzes you on every "um", "like", and "you know." It's addictive, competitive, and genuinely makes you a better speaker in minutes.
How We Built It
Rhetor is a React 19 PWA built with Vite, Zustand for state management, and Framer Motion for animations. The entire AI experience is powered by Gemini 2.5 Flash Native Audio via the @google/genai Live API.
The Gemini Live API Integration
The Live API is the backbone of Rhetor. We use it for:
- Bidirectional real-time audio: User's mic streams PCM16 @ 16kHz to Gemini, Gemini streams coaching audio back at 24kHz. Always-on mic with echo cancellation enables natural barge-in — Zeus can interrupt you mid-sentence
- Live transcription: Both input and output transcription enabled, powering the filler detector, WPM tracker, and teleprompter auto-advance
- Function calling: 16 custom tools let Zeus control the entire app — navigate between views, show visual coaching cards, award drachmas, start/stop timers, save pitch data, and control recording playback
- Multimodal input: Camera feed and screen share sent as video frames alongside audio — Gemini sees your posture AND your slides simultaneously
- Session context switching: Different system prompts for each mode (Welcomer, Coach, Interviewer, Analyst, Lesson) with seamless transitions
Audio Pipeline
Custom AudioRecorder using AudioWorklet for low-latency PCM16 capture, paired with a VU meter worklet for real-time volume visualization and breathing detection. The AudioStreamer handles Gemini's response audio with proper buffering for smooth playback.
Filler Detection
Regex-based filler detector running on live transcript, categorizing fillers into: hesitation (um, uh), filler words (like, basically), hedges (you know, I mean), and discourse markers (so, right, okay). Events are timestamped and stored for post-session timeline visualization.
Teleprompter Intelligence
Custom matching algorithm extracts key terms from each bullet point and maintains a sliding window over the live transcript. When 60%+ of a bullet's key terms appear in recent speech, the teleprompter auto-advances — creating the magical experience of the app knowing where you are in your talk.
Challenges
- Echo cancellation: Getting barge-in to work (speaking while AI speaks) required careful tuning of browser echo cancellation and audio pipeline timing
- Auto-reconnect reliability: The Live API WebSocket connection needs robust reconnection logic — we built exponential backoff with jitter, message queuing during reconnection, and heartbeat monitoring
- Transcript timing: Matching live transcript to bullet points for auto-advance requires fuzzy matching with stemming — "investing" needs to match "investor"
- Scope discipline: The hardest challenge was cutting features. Rhetor's vision includes full MediaPipe posture analysis, vocal pitch tracking, multiplayer practice, and AI-generated slide decks. Shipping a focused, working MVP in 48 hours meant saying no to 80% of the ideas
What We Learned
- Gemini Live API is genuinely different from standard LLM APIs. The bidirectional audio stream with function calling creates experiences that feel like talking to a real coach, not querying a chatbot
- Speaking exercises work: Even in testing, doing 2 minutes of breathing + voice warm-up before a pitch made a noticeable difference in delivery quality
- The teleprompter auto-advance is the "wow" moment: Every tester's reaction when the prompter moves on its own is the same — "Wait, it knows where I am?"
- Filler awareness alone improves speaking: Just having a visible counter makes people self-correct within minutes
What's Next
- Full Academy: AI-generated lessons for breathing, voice modulation, storytelling, pacing, and audience engagement
- MediaPipe vision: Real-time posture scoring, eye contact tracking, and gesture analysis
- Vocal pitch analysis: Detect monotone delivery, coach for dynamic range
- Audience simulation: "I'm a skeptical investor — pitch me" with dynamic AI personalities
- Session history & progression: Track improvement over weeks, unlock achievements
- Multiplayer: Practice with friends, give each other AI-augmented feedback
- Leaderboards: Weekly speaking leagues — the Symposium
"The art of rhetoric is the art of ruling the minds of men." — Plato
Rhetor puts that art in everyone's pocket.
Built With
- gemini
- googleaistudio
- movenet
- react
- tensorflow.js
- typescript
- webaudio

Log in or sign up for Devpost to join the conversation.