TikTUM

system design

Inspiration

TikTok, Reels, and YouTube Shorts, to name a few, ushered in an era of tech companies competing for our limited attention spans. Our project aims to use these same techniques to get you addicted to content that actually matters.

What it does

Automatically divide lectures into meaningful segments and display them as TikTok-style clips
Socialize over video clips; react, comment, share, directly ask questions about lecture segments and receive answers from students or an integrated bot with knowledge of the lecture
Seamlessly search through hours of video content with the help of accurate audio transcription models
Generate lecture subtitles in (almost) any language to even further reduce any learning barriers
Make learning more fun by applying filter and voiceover effects on lecture content

How we built it

Our video processing pipeline (built in Python) analyzes the lecture recordings and splits them into easy-to-digest segments. We first extract the individual frames, analyze the change in pixel content over time and use the silence from the audio to infer segment cues within the recording. We further analyze the slides using image-to-text to get the headlines and create an outline for the entire lecture. For tasks such as transcription, voiceover, and intelligent Q&A Chatbot we query APIs such as openAI's whisper, and openAI's GPT-3.

Our frontend is powered by TypeScript and React and features an infinite-scroll feed, a lectures overview page with automatically-created outlines, and a search area where users can search through the lecture transcription and are displayed the results linking to the video snippets with the relevant content.

Challenges we ran into

Finding practical methods for extracting information from video data. Limited frontend experiences. Operating solely on coffee and mate.
Took way longer than expected to convince Sir David Attenborough to work with us ;-)

Accomplishments that we're proud of

Great teamwork, everybody was helpful and supportive. Our approaches to video processing (segmentation, text extraction, voiceover effects, etc.) worked much better than initially expected.