Inspiration

We want to allow users to react easier to snips. We also want to allow them to discover new snips which they would never have searched for.

What it does

The python script takes an image url as a parameter and analyzes the face on the given image. One out of 7 emotions is returned. Based on this a snip is chosen. Therefore we search dubsmash for the snips which are tagged with the given emotion or feature it in their name and analyze the snip's text. This snip is returned to the user and he can express his feelings with this random but fitting snip.

How we built it

The facial recognition and the analysis of the facial expression is done by using Microsoft Azure via RESTful API. Because the tags and names of the snips are not necessarily correctly annotating the snip we decided to analyze them by ourselves. We only use the tag/name as a first filter step. We downloaded a set of 100 snips for every emotion we feature. Then we convert the audio files to a format suitable for the Bluemix/Watson platform's Speech-to-text unit. This is run in the cloud and does an emotion analysis on the transcript of the snip. The sediments of the single emotions are returned. On our client we evaluate all returned sediments for the given emotion and choose the snip with the highest value. The link to the respective audio file is returned and the snip can be played in the browser.

Please note that our application cannot be started from the given Github link because we had to erase several API-keys in order to keep them private.

Challenges we ran into

It was not easy to get the data in a format which was suitable to analyze them properly. The main problem was the aac/m4a encoding of the files, while Watson only accepts wav, flac and ogg. Because there was no free API available we were forced to do this locally. In order to access the converted files we had to upload them to a AWS S3 cloud storage. Another problem was the missing JSON structure of our computed data. At this point we had the emotional annotations for every snip but were not able to process them properly. Our solution is a (somewhat hacky) string manipulating python script which just gets the highest ranked snip for the given emotion. Our plan to use the precomputed data as a learning set for a local machine learning algorithm in order to predict the emotion of the trending snips was applicable because of the lack of time and the missing structure of our data.

Accomplishments that we're proud of

We were able to find matching snips for different emotions. While testing we found out, that the emotion our system proposed was matching the expressed emotion way better than the emotion given by the user in the tag. Additionally we had to cope with a lot of problems concerning data incompatibility but could solve most of it although we are data analytics/machine learning beginners.

What we learned

We learned that it is not easy to work with given data received via a API. The data was partly missing critical values, was tagged falsely and the audio files were not encoded correctly for our use case. But we also learned that most of the time there is a workaround to convert the data to fit our needs.

What's next for DubMotion

Analyze more snips for every emotion. Use the analyzed snips as a learning set in order to predict the emotion for other snips. Discuss acqui-hire offering from Google.

Share this project:

Updates