Inspiration

Live meetings and conversations often lack accessible, real-time captioning — especially for people with hearing difficulties. We wanted to build an AI tool that empowers inclusivity and makes communication effortless.

What it does

Transcripta captures live speech or uploaded audio and generates accurate, readable captions using Whisper AI and Google Speech Recognition — all inside a simple, Streamlit-powered interface.

How we built it

We used: Streamlit for the UI OpenAI’s Whisper for audio transcription SpeechRecognition + Google STT for real-time capture Tempfile and FFMPEG for seamless audio processing

Challenges we ran into

Whisper’s dependency on FFmpeg required setup troubleshooting Streamlit’s lack of native microphone support led us to blend Whisper with Google STT Deployment restrictions on platforms like Replit and Streamlit Cloud

Accomplishments that we're proud of

Achieved accurate transcription from both live speech and audio files Designed an intuitive UI for accessibility and non-technical users Integrated two STT methods to overcome platform limitations

What we learned

How to integrate Whisper into real-world apps Handling audio input in Python with limited frontend support The importance of fallback strategies when platform constraints arise

What's next for Transcripta

Add support for live multilingual transcription Enable collaborative note-taking Improve real-time feedback loop for longer meetings Deploy a fully hosted web version for universal access

Built With

Share this project:

Updates