Speech Wizard

Home Scren
Challenge Screen
Analysis Screen

Inspiration

The motivation behind developing this AI-based speech app stemmed from recognizing a persistent challenge within the fields of speech therapy and ESL education. For speech therapy, it was evident that many individuals, especially those without easy access to professional therapists, faced obstacles in their journey to improve their communication skills. In ESL education, engaging and educative supplements to learn English are extremely helpful, yet current tools are lackluster. To address this issue, I embarked on creating a solution that could bridge this gap and offer a practical, accessible means of speech therapy and ESL education.

What it does

The goal was to design an innovative and intelligent mobile app that delivers effective speech exercises, empowering users to work on their communication abilities at their own pace. Students start my inputting their desired pace (in words per minute) and are presented by challenges (passages to read aloud). A cursor moves through the words encouraging students to keep moving forward. Their speaking is recording and transcribed and compared to the original passage (Wizard Words). The amount of similarity is computed to assess the quality of the student's speech. In addition, the Wizard uses natural language processing algorithms to determine what words the student struggles with and suggests the next passage in order to improve on those words/phrases. Students can also playback their audio as well as what the words should sound like in order to recognize areas of improvement.

How we built it

Front-End (iOS): React Native, Expo
Transcription: OpenAI Whisper Speech-To-Text
Similarity Analysis: Scikit-Learn, Tensorflow
Custom N-grams Model for Passage Suggestion: Tensorflow, NumPy
Passage Generation: OpenAI GPT-3.5
Text-To-Speech: Google Cloud Text-To-Speech (to be implemented fully)
Back-End Web Server: Python Flask

Challenges we ran into

Getting the audio to save on the mobile device and playback.
Sending audio files to server and back (need to encode binary as base64 in HTTP request)
Rendering screens, animations, and preventing the front-end from bugging out on reloads and navigating between screens
Google Cloud deployment for text-to-speech of Wizard Words

Accomplishments that we're proud of

My first mobile app, EVER!
Custom language model that uses N-grams and count probabilities to suggest words that the student most struggles with.
Full end-to-end pipeline that improves a student's communication skills.

What we learned

React Native can be really annoying... LOL
How to build an iOS App as well as test it in a simulator or on an iPhone
Building a full-stack React Native app that talks to the server and dynamically renders output.