Inspiration

The motivation behind developing this AI-based speech app stemmed from recognizing a persistent challenge within the fields of speech therapy and ESL education. For speech therapy, it was evident that many individuals, especially those without easy access to professional therapists, faced obstacles in their journey to improve their communication skills. In ESL education, engaging and educative supplements to learn English are extremely helpful, yet current tools are lackluster. To address this issue, I embarked on creating a solution that could bridge this gap and offer a practical, accessible means of speech therapy and ESL education.

What it does

The goal was to design an innovative and intelligent mobile app that delivers effective speech exercises, empowering users to work on their communication abilities at their own pace. Students start my inputting their desired pace (in words per minute) and are presented by challenges (passages to read aloud). A cursor moves through the words encouraging students to keep moving forward. Their speaking is recording and transcribed and compared to the original passage (Wizard Words). The amount of similarity is computed to assess the quality of the student's speech. In addition, the Wizard uses natural language processing algorithms to determine what words the student struggles with and suggests the next passage in order to improve on those words/phrases. Students can also playback their audio as well as what the words should sound like in order to recognize areas of improvement.

How we built it

  • Front-End (iOS): React Native, Expo
  • Transcription: OpenAI Whisper Speech-To-Text
  • Similarity Analysis: Scikit-Learn, Tensorflow
  • Custom N-grams Model for Passage Suggestion: Tensorflow, NumPy
  • Passage Generation: OpenAI GPT-3.5
  • Text-To-Speech: Google Cloud Text-To-Speech (to be implemented fully)
  • Back-End Web Server: Python Flask

Challenges we ran into

  • Getting the audio to save on the mobile device and playback.
  • Sending audio files to server and back (need to encode binary as base64 in HTTP request)
  • Rendering screens, animations, and preventing the front-end from bugging out on reloads and navigating between screens
  • Google Cloud deployment for text-to-speech of Wizard Words

Accomplishments that we're proud of

  • My first mobile app, EVER!
  • Custom language model that uses N-grams and count probabilities to suggest words that the student most struggles with.
  • Full end-to-end pipeline that improves a student's communication skills.

What we learned

  • React Native can be really annoying... LOL
  • How to build an iOS App as well as test it in a simulator or on an iPhone
  • Building a full-stack React Native app that talks to the server and dynamically renders output.

What's next for Speech Wizard

  • Fully implementing text-to-speech for the Wizard Words (the actual passage), so the student can hear what the audio is supposed to sound like
Share this project:

Updates