Inspiration

One of our team members is a DJ that tirelessly spent hours upon hours removing profanity from words to make them age appropriate for events. We thought, there has to be a better way, where we can detect these words in real-time and catch them before they make it out through the speaker. That's how we came up with SoapyMouth.

What it does

  1. Captures Audio:
    Continuously records audio from your microphone in small chunks.

  2. Transcribes Speech:
    Uses a real-time local speech-to-text engine (faster-whisper) to transcribe each chunk of audio as you speak.

  3. Detects Swear Words:
    Checks each transcribed chunk for the presence of swear words using a customizable profanity filter.

  4. Censors Audio:
    If a swear word is detected, after a short delay, the corresponding audio chunk is muted and a beep is played through your speakers, ensuring that offensive language is not broadcasted.

How we built it

First we experimented with multiple transcription engines to get the fastest performing one. Then we settled on a local model of whisper. We fed that into a word list detection algorithm while also routing the microphone input to the app. The app detects if the word is in the blacklist, and censors the corresponding audio output and outputs the rest through the device speakers.

Challenges we ran into

One hard part about this project was optimizing that model to be able to run in real-time and continue streaming the voice to get the transcription.

Accomplishments that we're proud of

  • Getting the model latency down to the single digits
  • Being able to discern swear words from words that sound similar

What we learned

  • Local model optimization
  • Real-time audio signal processing
  • Swearing at computer's was completely normal for this project
  • Swearing at AI is fun

What's next for SoapyMouth

  • Further optimization to get it even more responsive
  • Packaging as a plugin for commercial use in audio workstations

Built With

Share this project:

Updates