Inspiration
One of our team members is a DJ that tirelessly spent hours upon hours removing profanity from words to make them age appropriate for events. We thought, there has to be a better way, where we can detect these words in real-time and catch them before they make it out through the speaker. That's how we came up with SoapyMouth.
What it does
Captures Audio:
Continuously records audio from your microphone in small chunks.Transcribes Speech:
Uses a real-time local speech-to-text engine (faster-whisper) to transcribe each chunk of audio as you speak.Detects Swear Words:
Checks each transcribed chunk for the presence of swear words using a customizable profanity filter.Censors Audio:
If a swear word is detected, after a short delay, the corresponding audio chunk is muted and a beep is played through your speakers, ensuring that offensive language is not broadcasted.
How we built it
First we experimented with multiple transcription engines to get the fastest performing one. Then we settled on a local model of whisper. We fed that into a word list detection algorithm while also routing the microphone input to the app. The app detects if the word is in the blacklist, and censors the corresponding audio output and outputs the rest through the device speakers.
Challenges we ran into
One hard part about this project was optimizing that model to be able to run in real-time and continue streaming the voice to get the transcription.
Accomplishments that we're proud of
- Getting the model latency down to the single digits
- Being able to discern swear words from words that sound similar
What we learned
- Local model optimization
- Real-time audio signal processing
- Swearing at computer's was completely normal for this project
- Swearing at AI is fun
What's next for SoapyMouth
- Further optimization to get it even more responsive
- Packaging as a plugin for commercial use in audio workstations
Log in or sign up for Devpost to join the conversation.