ZotTalks

Inspiration

Short farm content called "Brainrot" is wildly viral nowadays on social media with so many different variants where people are manually hand making these videos. You have FullStackPeter on Instagram as a great example where he goes over fullstack content garnering half a million to a million views every time he posts. We wanted to automate this process for anyone to make content with ease. Whether it's for their own personal use, teaching, or just social media presence it has real world impact that many people are benefiting and using today. We also asked what if you could upload any PDF and watch two friendly characters “talk” you through it? By combining bite-sized video with conversational AI, we aimed to transform passive reading into an active, enjoyable learning experience.

What it does

ZotTalks lets you drag-and-drop a PDF of study materials and generates a dialogue however long you want between two AI avatars. One avatar presents key concepts, while the other asks clarifying questions or offers real-world examples, keeping the flow dynamic. The result is a shareable video summary that highlights main ideas, so you grasp complex topics faster and with less effort.

How we built it

File Storage & Management: PDFs are uploaded to and retrieved from Amazon S3, ensuring scalable, durable storage.
Text Extraction: We leverage Amazon Textract’s OCR and document analysis to reliably parse text, tables, and layout elements from diverse PDF formats.
Summarization Engine: Using Amazon Bedrock’s foundation models, we generate concise, coherent scripts that distill key concepts without losing essential detail.
Avatar Creation & TTS: Custom 2D anteater characters are rigged for basic lip-sync, while Amazon Polly’s neural voices—tweaked for pitch and cadence—bring each persona to life.
Video Composition: Video segments, lip-sync data, and on-screen overlays are programmatically assembled via MoviePy.
Frontend: A React web app handles file uploads, displays real-time progress (using S3 pre-signed URLs), and embeds the resulting video for immediate playback or sharing.

Challenges we ran into

PDF Variability: Inconsistent formatting—images, tables, footnotes—often threw off our parser, requiring custom post-processing rules.
Balancing Brevity & Accuracy: Early summaries either felt too shallow or too verbose. Tuning prompt engineering and model parameters was crucial.
Lip-Sync Precision: Getting mouth movements to align with the TTS output demanded frame-level timing adjustments and manual viseme mapping.

Accomplishments that we're proud of

Simplicity: We made the entire process Achieved an end-to-end flow—from PDF upload to shareable video extremely simple for anyone to do. Frontend also provides an easy to pickup experience where even Steve in the 5th grade can pickup and use.
Connecting with Other: The whole point of this entire project was to let everyone participate in the community, much how the AI team at UCI and AWS Mike Glover made gave us the opportunity to join such an amazing event. This is a little shoutout to them, thanks for everything!
Positive User Feedback: In our testing users reported that ZotTalks helped them retain information more effectively than reading alone.
Scalability: By leveraging serverless AWS functions, we can process hundreds of PDF-to-video conversions per day without manual intervention.

What we learned

Fine-tuning summarization prompts is an art: small changes in phrasing can greatly improve the balance between depth and conciseness.
Even high-quality TTS requires post-processing (e.g., slight pauses, emphasis markers) to feel truly conversational.
Building robust PDF parsers means accounting for countless edge cases—investment here pays off in reliability.

What's next for ZotTalks

Interactive Quizzes: Embed pause-and-quiz moments so learners can test comprehension before moving on.
User uploading their own background and characters. They can even enter their own voice and photos if they'd like.
Multilingual Support: Expand to Spanish, Mandarin, and beyond, with localized avatars and voices.
Mobile App: Offer offline downloads, push-notification reminders, and a streamlined interface for on-the-go study.
Custom Avatar Studio: Let users design or upload their own characters, complete with voice cloning for familiar narrators.