Hermex
🔥 Inspiration
Online tutorials are powerful, but they often fall short in helping learners actively engage with the content. Many learners passively watch videos without checking their understanding or revisiting important concepts. We wanted to reimagine tutorial learning by transforming YouTube videos into interactive learning experiences, where AI pauses at key moments, quizzes the learner, and later summarizes the lesson with personalized review materials.
This idea was inspired by our personal struggles with staying focused during long videos and our passion for improving online education using GenAI.
🛠️ How We Built It
We used a modern tech stack optimized for performance and rapid iteration:
- Frontend: React + TailwindCSS +
react-youtubeto embed and control the video playback experience. - Backend: Python + FastAPI to handle preprocessing, transcript analysis, and question generation.
- AI: OpenAI's GPT-4o for summarization, checkpoint creation, and review generation.
- Voice & Video AI: MuseTalk for lip-syncing AI avatars, and Zonos for voice cloning.
- Hosting & Infra: Firebase for authentication, CloudRun for backend deployment, and Runpod for GPU processing.
🧠 What We Learned
- How to create a seamless AI-driven user experience using real-time checkpointing.
- Orchestrating multiple GenAI models together (text → voice → video).
- Challenges of syncing video playback with dynamically generated AI content.
- Importance of UI/UX clarity in educational tools.
🚧 Challenges We Faced
- Video Syncing: Ensuring that AI-generated checkpoints aligned precisely with YouTube video timestamps.
- Latency: Managing the delay between AI processing and frontend playback.
- AI hallucinations: Prompt engineering was critical to reduce inconsistencies in the generated questions and summaries.
- Deployment Issues: Integrating GPU-based tools (like MuseTalk) into our cloud infrastructure was non-trivial.
✅ Final Outcome
Our final product offers a pause-and-learn flow: the video stops at intelligently selected points, asks relevant questions, and helps learners revisit content through personalized summaries. It feels like a tutor is watching with you — ready to help at every important moment.
Built With
- fastapi
- openai
- python
- react
- typescript
Log in or sign up for Devpost to join the conversation.