Speakalytics

Inspiration

Public speaking is a skill that impacts many areas of life, but for most, it's accompanied by anxiety and challenges like filler words, lack of confidence, and low audience engagement. We wanted to build a tool that could provide real-time, actionable feedback to help speakers improve their delivery, ultimately making public speaking more accessible and less intimidating for everyone.

What it does

Speakalytic enables users to record their speeches and receive comprehensive feedback on key metrics such as clarity, conciseness, confidence, and emotional reach. Filler words are highlighted in real-time within the transcription, accompanied by tailored suggestions for improvement and an overall evaluation. This feedback empowers users to build stronger speaking habits and connect more effectively with their audience. User recordings are stored in MongoDB, allowing them to review past performances and monitor their progress over time.

How we built it

Frontend: We used React and TypeScript to build a responsive UI, creating a live transcription display, a recording interface, and a feedback popup for an intuitive user experience. We styled the transcription display to dynamically highlight filler words like "um" and "ah" in red.
Backend: Our backend uses Python to manage API requests and perform speech analysis. We leveraged the Deepgram API for live transcription with filler word detection and used the Groq API to evaluate aspects of speech delivery.
Real-Time AI Integration: The Deepgram API was used in generating accurate live transcriptions, with punctuate and filler word settings enabled for precision. We also worked with the Groq API to assess the speech based on clarity, confidence, and emotional reach.

Challenges we ran into

API Synchronization: Coordinating multiple API requests while maintaining low latency was critical to the app’s usability, requiring careful management of data flow and response handling.
Streaming Audio to Flask: Sending audio data to a Flask server for live transcription turned out to be more complex than expected. So, we switched to handling transcription directly on the client side with TypeScript, making the process smoother and more efficient.
User-Friendly Design: Crafting an engaging and intuitive interface while displaying detailed feedback was challenging. Balancing technical functionality with a seamless user experience was a constant focus.
Database Design with GridFS: Crafting an efficient database schema and implementing audio file storage with GridFS presented a unique set of challenges, but we successfully established a structure that supports both scalability and easy access.

Accomplishments that we're proud of

Live Transcription: We successfully implemented live transcription, allowing users to see their words appear in real-time as they speak, providing immediate insights and feedback.

User Authentication: We developed a fully functional login and signup system, seamlessly integrated with Firebase to ensure secure and reliable user authentication.

MongoDB Integration: Designing and connecting the MongoDB schema to the frontend was a large milestone, enabling us to store and retrieve user data efficiently.

Interactive Feedback Modal: We created a responsive modal popup that appears upon submission, providing users with an organized summary of their speech evaluation and tailored suggestions for improvement.

Groq Integration for Feedback: Connecting our transcription system to Groq allowed us to generate in-depth feedback on clarity, confidence, conciseness, and emotional reach, elevating the overall quality of our evaluations.

What we learned

Throughout this project, we learned how to tackle real-time data handling by implementing live transcription with Deepgram, which challenged us to manage latency and streaming accuracy effectively. Setting up secure user authentication with Firebase also helped us grow our understanding of authentication flows and data protection in modern web applications. Designing and connecting the MongoDB schema to the frontend taught us about structuring and storing dynamic data, and creating the interactive modal for feedback improved our skills in responsive UI development.

Integrating Groq for generating specific feedback gave us a look at how AI can analyze and interpret speech patterns. Overall, each feature we built grew our technical abilities.

What's next for Speakalytics

We plan to deploy Speakalytic to production, hosting the website and connecting it to a remote database server for broader accessibility. Our goals also include enhancing audio detection accuracy, adding progress tracking for each user's speaking journey, and providing even more personalized feedback for continuous improvement.