Inspiration

All athletes benefit from seeing their performance, whether good or bad. We focus on athletes highlighting the good and reduce the cost of creating highlight videos by 100x, increasing their chances of exposure, sponsorship, and recognition for their hard work.

What it does

We applied a multimodal AI approach, using video, text, and audio inputs to programmatically generate unique videos that users can post anywhere.

Here is an example of what an original video looked like, which is over an hour long.

Here is what the resulting exposure video looks like, cutting it down to minute.

How we built it

Web app uses several AI models in the backend through various API calls to create an output based on users inputs.

This is the frontend flow created for the hackathon.

Here is the initial backend flow, which required a pivot after models errored out.

Challenges we ran into

  1. Challenge 1: Video to text model errored out (VideoLlama on HuggingFace), so couldn't create domain agnostic prompt for LLM search query, ruining entire entry into automation (had to redo after a day of work).

    Solution 1: Overlap with musicgen, but this ran into a latency issue with clear duration issues so ran on loop.

  2. Challenge 2: Video to text model erroring out also meant we could get exact voice narration of activity, causing a pivot to just music overlay.

    Solution 2: Resolved with musicgen on replicate.

  3. Challenge 3: Highlights not flowing in correct order after Twelve Labs output (ie text search in videos).

    Solution 3: Appended timestamp to beginning of all video chunks to rank chronologically.

  4. Challenge 4: Duration of video chunks (chunked via python library movie py) output from Twelve Labs varied in length.

    Solution 4: Set median to be +/- 5.5 seconds in order to keep highlights in center point and remove unnecessary footage.

  5. Challenge 5: Largely one dev

    Solution 5: Beast Mode

Accomplishments that we're proud of

The full stack: getting a video to be cut up and a highlight reel to be created demanded video-to-text, text-to-text, text-to-video, and text-to-audio AI models working all together in unison. A lot of potential errors could have broken this flow.

What we learned

On the non-technical side: The real problem athletes at all levels really face when it comes to staying afloat financially. Insane that even professional athletes don't feel financially stable and so many national teams are worried about funding and sponsorships. This is a real solution to a clear, costly problem.

On the technical side: Multimodal work can largely be accomplished by stitching together APIs and Figma to code flow was used (although not perfect).

What's next

Add in a cleaner frontend where users can select their own clips.

Setup social sharing to flow directly from web app to social accounts.

Run testing with student athletes to improve user experience and update per needs

Built With

Share this project:

Updates