Inspiration
The inspiration behind this project came from our desire to help people improve their public speaking skills. Public speaking can be nerve-wracking, and many struggle with aspects like filler words, tone, and pace. We wanted to create an AI-powered tool that gives real-time feedback, helping users become more confident and effective speakers.
What it does
The AI Coach for Public Speaking analyzes users’ speech from audio files and provides feedback on key aspects of public speaking. It scores the user based on their tone, speed, and the number of filler words they use. This feedback helps speakers identify areas for improvement and track their progress in becoming better communicators.
How we built it
We built the project using Flask for the backend, Next.js for the frontend, and Docker to containerize the entire application for easy deployment. For speech analysis, we used Gemini API to transcribe the audio into text. However, since Gemini only provided text and not tone or speed, we incorporated Librosa, a Python library, to analyze tone and speed from the audio itself, which enriched the feedback we could offer.
Challenges we ran into
One of the major challenges we faced was that Gemini API only converts audio to text, but does not provide tone or speed analysis. To overcome this, we had to integrate Librosa, which helped us analyze these aspects directly from the audio. Another challenge was ensuring that the application provided accurate and meaningful feedback to users in a way that was both helpful and easy to understand.
Accomplishments that we're proud of
We’re proud of creating a tool that is not only functional but also presentable. We overcame several technical hurdles to integrate different APIs and libraries, and the result is a project that offers real-time feedback for public speaking improvement. It’s exciting to see something we worked so hard on come together and be ready to help others.
What we learned
Throughout this project, we gained hands-on experience with frameworks like Flask and Next.js. We also learned how to work with APIs and handle different data formats. The biggest takeaway was learning how to combine multiple tools (Gemini, Librosa, Flask, Next.js) to create a cohesive solution. Additionally, we learned a lot about the challenges involved in real-time speech analysis and feedback.
What's next for AI Coach for Public Speaking (Real-time Feedback)
Looking forward, we plan to integrate facial recognition to analyze eye contact during speech. This will add another layer of feedback to the coaching process, helping users improve their non-verbal communication. We’re excited to keep enhancing the AI Coach and make it even more comprehensive and useful for public speakers.
Log in or sign up for Devpost to join the conversation.