Inspiration

Brainstorming sesh with the boys

What it does

Takes input text and stitches together youtube clips across a subset of the youtube dataset. Specific dataset used is https://huggingface.co/datasets/chenjoya/Live-WhisperX-526K

What we learned

Lesson learnt: always verify api usage limiting before fully committing to a project. We tried to build our own dataset using youtube transcripts api, but it was just impossible since the usage credits were ate up so quickly.

What's next for ClipScribe

Actually build a high quality dataset with timestamps that delimit clear separation of words

Built With

Share this project:

Updates