AI Coach for Public Speaking (Real-time Feedback)

Interface
Results
Thanks RocketHacks

Inspiration

The inspiration behind this project came from our desire to help people improve their public speaking skills. Public speaking can be nerve-wracking, and many struggle with aspects like filler words, tone, and pace. We wanted to create an AI-powered tool that gives real-time feedback, helping users become more confident and effective speakers.

What it does

The AI Coach for Public Speaking analyzes users’ speech from audio files and provides feedback on key aspects of public speaking. It scores the user based on their tone, speed, and the number of filler words they use. This feedback helps speakers identify areas for improvement and track their progress in becoming better communicators.

How we built it

We built the project using Flask for the backend, Next.js for the frontend, and Docker to containerize the entire application for easy deployment. For speech analysis, we used Gemini API to transcribe the audio into text. However, since Gemini only provided text and not tone or speed, we incorporated Librosa, a Python library, to analyze tone and speed from the audio itself, which enriched the feedback we could offer.

Challenges we ran into

One of the major challenges we faced was that Gemini API only converts audio to text, but does not provide tone or speed analysis. To overcome this, we had to integrate Librosa, which helped us analyze these aspects directly from the audio. Another challenge was ensuring that the application provided accurate and meaningful feedback to users in a way that was both helpful and easy to understand.

Accomplishments that we're proud of

We’re proud of creating a tool that is not only functional but also presentable. We overcame several technical hurdles to integrate different APIs and libraries, and the result is a project that offers real-time feedback for public speaking improvement. It’s exciting to see something we worked so hard on come together and be ready to help others.

What we learned

Throughout this project, we gained hands-on experience with frameworks like Flask and Next.js. We also learned how to work with APIs and handle different data formats. The biggest takeaway was learning how to combine multiple tools (Gemini, Librosa, Flask, Next.js) to create a cohesive solution. Additionally, we learned a lot about the challenges involved in real-time speech analysis and feedback.

What's next for AI Coach for Public Speaking (Real-time Feedback)

Looking forward, we plan to integrate facial recognition to analyze eye contact during speech. This will add another layer of feedback to the coaching process, helping users improve their non-verbal communication. We’re excited to keep enhancing the AI Coach and make it even more comprehensive and useful for public speakers.

Built With

Updates

Khai Nguyen started this project — Mar 16, 2025 10:37 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.