Inspiration
Brainstorming sesh with the boys
What it does
Takes input text and stitches together youtube clips across a subset of the youtube dataset. Specific dataset used is https://huggingface.co/datasets/chenjoya/Live-WhisperX-526K
What we learned
Lesson learnt: always verify api usage limiting before fully committing to a project. We tried to build our own dataset using youtube transcripts api, but it was just impossible since the usage credits were ate up so quickly.
What's next for ClipScribe
Actually build a high quality dataset with timestamps that delimit clear separation of words

Log in or sign up for Devpost to join the conversation.