CommandF

Inspiration

Ever wanted to find that one scene in the middle of a long video? Or maybe that one section of a speech?

If it was text, you would type in command-f to search through the text to find the words you want. What if you could command-f videos to find the scenes and audio you want?

What if instead of scrolling through the video to find it, an AI could tell you exactly where the section you want is?

What it does

CommandF is command f for videos. Search for a query, and the algorithm finds the points in the video when it sees or hears what you asked for.

Enter a query -> Find the images or audio of that query in the video

If there's an hour long presidential debate and you want to find out when Donald Trump said "wrooong", type it into the search box and CommandF will find it for you.

If Logan Paul made a video and you wanted to find out when he plugs his merch, type it into the search bar and CommandF will find it for you.

How we built it

Given a search query, we break down the query using NLP. The algorithm shortlists the search space by looking at the density of information containing the query keywords using the subtitles. If none are available, the algorithm creates subtitles using machine learning. We then run each frame through Darkflow, which uses C to run the YOLO object detection algorithm and finds the tags in the video. These tags are compiled into a list which we compare against the search query and the output time stamps are sent back over a flask backend.

Challenges we ran into

Making this product was a problem of optimisation. We had to get all the frames of a video, get all the subtitles of a video, run object detection on each and every frame, parse all the output into an object and then have the query search through the entire object in less than a minute.

To solve this we took a lot of methods. Firstly, we tried to remove as many for loops from our code as possible to provide a vectorised solution. Next, we realised that we didn't need to process at 24fps, so we reduced the fps to 1 so we could process every second of the video through the YOLO object detection algorithm in time.

Accomplishments that we're proud of

We integrated a machine learning model that predicts images on every second of a video
Our product uses both NLP and object detection to determine which part of the image the user is searching for
We deployed our model as a REST api, so it can be used to further integrate with different services
Our product is non-invasive and integrates directly into the users browser as a plugin