Rhetor — The AI Speaking Coach That Interrupts You Before Your Audience Does

Inspiration

Every founder, every speaker, every job candidate practices their pitch the same way — talking to a wall. No feedback, no interruptions, no coaching. By the time you watch the recording back, the moment is gone.

I've won 15+ hackathons and pitched at dozens of events. The single biggest difference between winning and losing a pitch isn't the idea — it's the delivery. How you breathe, how you pause, how you stand, whether you say "um" twelve times without noticing.

Professional speakers have coaches. Toastmasters has evaluators. TED speakers rehearse for months with feedback loops. But 99% of us? We practice alone and hope for the best.

Rhetor changes that. It's a real-time AI speaking coach powered by Gemini Live API that doesn't just listen — it interrupts you, coaches you, and helps you get better in the moment.

The name comes from ancient Greek — a rhetor was a teacher of rhetoric, the art of persuasive speaking. Rhetor brings that 2,500-year-old tradition into the AI age.

What It Does

Rhetor is a Duolingo for public speaking — combining structured exercises, real-time practice, and AI-powered post-session analysis into one cohesive experience.

1. Prepare

Paste your speech text or upload a PDF of your slides — Gemini extracts structured talking points automatically
AI Interview mode: Zeus (your AI coach) asks you Socratic questions to build your pitch from scratch
Templates: Pre-built structures for elevator pitches, investor decks, conference talks, and job interviews

2. Warm Up

Before you speak, Rhetor guides you through exercises that professional speakers use every day:

Box Breathing (4-4-4-4) with real-time breathing detection via microphone
Voice warm-ups: Humming, lip trills, tongue twisters — with Gemini checking your articulation via live transcript
Body preparation: Power pose, tension release, grounding — with optional camera posture check

3. Practice

The core experience — practice your pitch with:

Auto-advancing teleprompter that tracks your speech in real-time and highlights the next bullet point when you've covered the current one
Live metrics dashboard: real-time WPM, filler word counter, elapsed time
AI coaching interruptions: Say "um"? Zeus cuts in: "Drop the filler!" Speaking too fast? "Slow down, breathe." Hit a key point well? "Strong!"
Screen sharing: Share your actual slides — Gemini sees them and coaches you in context

4. Review & Replay

After practice, Rhetor records and analyzes your full session:

Timestamped event timeline: See exactly when fillers, pace changes, and strong moments occurred
AI deep analysis: Gemini watches the tape and extracts highlights, improvements, and key moments to rewind to — with specific rewrite suggestions
"Review with Zeus" mode: Zeus walks you through your recording moment by moment, auto-seeking to key timestamps and coaching you through each one

5. Academy — Kill the Fillers

Interactive micro-lessons like "Kill the Fillers" — Zeus asks you random questions and you must answer without any filler words. Real-time detection buzzes you on every "um", "like", and "you know." It's addictive, competitive, and genuinely makes you a better speaker in minutes.

How We Built It

Rhetor is a React 19 PWA built with Vite, Zustand for state management, and Framer Motion for animations. The entire AI experience is powered by Gemini 2.5 Flash Native Audio via the @google/genai Live API.

The Gemini Live API Integration

The Live API is the backbone of Rhetor. We use it for:

Bidirectional real-time audio: User's mic streams PCM16 @ 16kHz to Gemini, Gemini streams coaching audio back at 24kHz. Always-on mic with echo cancellation enables natural barge-in — Zeus can interrupt you mid-sentence
Live transcription: Both input and output transcription enabled, powering the filler detector, WPM tracker, and teleprompter auto-advance
Function calling: 16 custom tools let Zeus control the entire app — navigate between views, show visual coaching cards, award drachmas, start/stop timers, save pitch data, and control recording playback
Multimodal input: Camera feed and screen share sent as video frames alongside audio — Gemini sees your posture AND your slides simultaneously
Session context switching: Different system prompts for each mode (Welcomer, Coach, Interviewer, Analyst, Lesson) with seamless transitions

Audio Pipeline

Custom AudioRecorder using AudioWorklet for low-latency PCM16 capture, paired with a VU meter worklet for real-time volume visualization and breathing detection. The AudioStreamer handles Gemini's response audio with proper buffering for smooth playback.

Filler Detection

Regex-based filler detector running on live transcript, categorizing fillers into: hesitation (um, uh), filler words (like, basically), hedges (you know, I mean), and discourse markers (so, right, okay). Events are timestamped and stored for post-session timeline visualization.

Teleprompter Intelligence

Custom matching algorithm extracts key terms from each bullet point and maintains a sliding window over the live transcript. When 60%+ of a bullet's key terms appear in recent speech, the teleprompter auto-advances — creating the magical experience of the app knowing where you are in your talk.

Challenges

Echo cancellation: Getting barge-in to work (speaking while AI speaks) required careful tuning of browser echo cancellation and audio pipeline timing
Auto-reconnect reliability: The Live API WebSocket connection needs robust reconnection logic — we built exponential backoff with jitter, message queuing during reconnection, and heartbeat monitoring
Transcript timing: Matching live transcript to bullet points for auto-advance requires fuzzy matching with stemming — "investing" needs to match "investor"
Scope discipline: The hardest challenge was cutting features. Rhetor's vision includes full MediaPipe posture analysis, vocal pitch tracking, multiplayer practice, and AI-generated slide decks. Shipping a focused, working MVP in 48 hours meant saying no to 80% of the ideas

What We Learned

Gemini Live API is genuinely different from standard LLM APIs. The bidirectional audio stream with function calling creates experiences that feel like talking to a real coach, not querying a chatbot
Speaking exercises work: Even in testing, doing 2 minutes of breathing + voice warm-up before a pitch made a noticeable difference in delivery quality
The teleprompter auto-advance is the "wow" moment: Every tester's reaction when the prompter moves on its own is the same — "Wait, it knows where I am?"
Filler awareness alone improves speaking: Just having a visible counter makes people self-correct within minutes

What's Next

Full Academy: AI-generated lessons for breathing, voice modulation, storytelling, pacing, and audience engagement
MediaPipe vision: Real-time posture scoring, eye contact tracking, and gesture analysis
Vocal pitch analysis: Detect monotone delivery, coach for dynamic range
Audience simulation: "I'm a skeptical investor — pitch me" with dynamic AI personalities
Session history & progression: Track improvement over weeks, unlock achievements
Multiplayer: Practice with friends, give each other AI-augmented feedback
Leaderboards: Weekly speaking leagues — the Symposium

"The art of rhetoric is the art of ruling the minds of men." — Plato

Rhetor puts that art in everyone's pocket.

Built With

gemini
googleaistudio
movenet
react
tensorflow.js
typescript
webaudio

Updates

Pawel Lach started this project — Feb 09, 2026 07:34 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.