Inspiration
Live meetings and conversations often lack accessible, real-time captioning — especially for people with hearing difficulties. We wanted to build an AI tool that empowers inclusivity and makes communication effortless.
What it does
Transcripta captures live speech or uploaded audio and generates accurate, readable captions using Whisper AI and Google Speech Recognition — all inside a simple, Streamlit-powered interface.
How we built it
We used: Streamlit for the UI OpenAI’s Whisper for audio transcription SpeechRecognition + Google STT for real-time capture Tempfile and FFMPEG for seamless audio processing
Challenges we ran into
Whisper’s dependency on FFmpeg required setup troubleshooting Streamlit’s lack of native microphone support led us to blend Whisper with Google STT Deployment restrictions on platforms like Replit and Streamlit Cloud
Accomplishments that we're proud of
Achieved accurate transcription from both live speech and audio files Designed an intuitive UI for accessibility and non-technical users Integrated two STT methods to overcome platform limitations
What we learned
How to integrate Whisper into real-world apps Handling audio input in Python with limited frontend support The importance of fallback strategies when platform constraints arise
What's next for Transcripta
Add support for live multilingual transcription Enable collaborative note-taking Improve real-time feedback loop for longer meetings Deploy a fully hosted web version for universal access
Log in or sign up for Devpost to join the conversation.