SpeechMAX

Thumbnail
Homepage
Eye detector
Body detector

Why We Built This

77% of people have speaking anxiety, but only 8% seek help. Professional speech coaching costs $100–300/hour — putting it out of reach for international students preparing for job interviews, neurodivergent professionals navigating workplace communication, and first-generation students who never had professional speaking modeled for them.

Existing tools like Yoodli and Orai only analyze audio, missing the 70% of communication that's nonverbal — your eyes, your posture, your body language. We wanted to build something that sees the full picture and makes practice fun, not stressful.

What SpeechMAX Does

SpeechMAX is a browser-based AI speech coach that analyzes your speaking across 5 dimensions in real-time:

Clarity — filler word detection via live transcription
Confidence — eye contact quality via 468 facial landmarks + iris tracking
Pacing — words-per-minute consistency and variation
Expression — pitch variation and vocal energy analysis
Composure — posture, fidgeting, blink rate, jaw tension, and biometric stress signals

You start by choosing your goal (interview, presentation, casual, or reading) then do a 30-second scan that builds your speech profile as an animated radar chart. Based on your weaknesses, SpeechMAX recommends targeted mini-games:

Game	Trains	How
Filler Ninja	Clarity	Speak without filler words — they get slashed on detection
Eye Lock	Confidence	Maintain eye contact — screen dims and warns when you look away
Pace Racer	Pacing	Keep your WPM in the target zone — gear system rewards sustained pace
Pitch Surfer	Expression	Vary your pitch to ride the wave — monotone = wipeout
Stage Presence	Composure	Master body language — open stance, gestures, commanding presence

Mike, your AI coach powered by Gemini 2.5 Flash, sees all your scores, game history, and badges — giving short, personalized advice through an in-app chat. Sign in with Google to sync progress across devices, or continue as a guest with zero friction.

How We Built It

All speech and video analysis runs 100% client-side — no audio or video ever leaves the browser:

MediaPipe FaceLandmarker — 468 facial landmarks + iris tracking + blendshapes for eye contact, blink rate, jaw tension, and lip compression
MediaPipe PoseLandmarker — 33 body keypoints for posture alignment, gesture quality, fidget detection, and bad habit recognition
Web Speech API — real-time transcription with confidence filtering and context-aware filler detection across 20+ filler words
Web Audio API — DynamicsCompressor → AnalyserNode chain for real-time pitch detection via autocorrelation

The backend uses Supabase for authentication (anonymous + Google OAuth), PostgreSQL with row-level security for data persistence, and an Edge Function that proxies Gemini API calls so the API key never touches the client.

The frontend is React 18 + TypeScript + Vite with Zustand for state management, Framer Motion for animations, and Tailwind CSS for styling. Every game has difficulty scaling based on your scan scores, goal-driven prompts, and per-game coaching tips.

Total API cost: $0. All ML runs client-side. Privacy by default.

Challenges We Faced

MediaPipe + Web Speech API running simultaneously causes heavy CPU load — we mitigated with frame-skipping (every 2nd frame) and singleton model caching so models aren't re-downloaded on navigation
Eye tracking race condition — the video element didn't exist when the gaze model tried to attach. We separated stream acquisition from video attachment and waited for the loadeddata event
Web Speech API merges repeated words — saying "um um um" becomes "um" in the transcript. We switched to count-based tracking on interim results instead of position-based detection
Expired JWT tokens on anonymous auth caused the Gemini proxy to silently return 401 errors. We switched to anon key authentication on the edge function to avoid token lifecycle issues
Filler detection during silence — the streak timer kept ticking when the user stopped talking. We added a 3-second silence threshold that pauses the game with visual indicators

What We Learned

Running multiple real-time ML models in the browser is viable but requires careful resource management — singleton patterns, frame skipping, and parallel initialization made the difference between smooth and janky
Gamification transforms a stressful activity into something people actually want to do. The gear system in Pace Racer and the streak counter in Filler Ninja create genuine engagement loops
Anonymous auth with zero-friction onboarding is critical for hackathon demos — judges shouldn't need to create an account to try your product
Supabase Edge Functions solve the API key exposure problem elegantly — one function, deployed once, and the secret never touches the frontend

What's Next

Deepgram Nova-3 for server-side transcription — solves the repeated word merging limitation and enables multilingual support
Session recording and playback — review your practice sessions with timestamped annotations
Multiplayer practice rooms — practice with friends via WebRTC with real-time feedback
Custom prompt upload — paste your actual interview questions or presentation script

Built With

framermotion
googlegemini
lucidereact
mediapipe
postgresql
react
supabase
tailwindcss
typescript
vercel
vite
webaudioapi
webspeechapi
zustand

Submitted to

UNIHACK 2026, funded by the European Union

Created by

I worked on the front end and implenting the framework

Matthew Susilo
I mainly contributed to the UI/UX. I designed and animated the 'Mike' character as well as edited the Pitch video.

hugo z
#Money
I worked on the backend and infrastructure — setting up Supabase for authentication, database, and
data sync across devices, building the Gemini proxy edge function to secure the API key, and
implementing Google sign-in and anonymous auth. I also built the game scoring engine with smooth
difficulty scaling, fixed game timer and speech recognition issues across all five games,
implemented context-aware filler word detection, and handled the Vercel deployment.

Bruno Jaamaa
mmakAKAAAA
I organised the entire team, planned the entire repo, structure of the app and delegated engineering tasks to the team.

AnamGTR99 Milfer