Babel Fish

Language selection: use Speech (powered by Speech to Text) or Touch on the Spectacles to select the target language
Adding recognition for different keywords for the Speech to Text component
Wearing the Spectacles to test our product
Scavenger hunt game mode!
A sample of our translation for selected English words into 7 other target languages.

Inspiration

"If you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language." ― Douglas Adams, The Hitchhiker's Guide to the Galaxy

Learning languages as an adult is hard. Learning languages as a child is easier. The key is immersion, an approach that the Rosetta Stone software popularized. Research shows that immersion forces learners to draw connections between target words and their associated objects rather than target words to equivalent words in the native language. This creates a stronger association between the target word and its actual representation in the learner's mind. This background is the idea behind Babel Fish!

What it does

Babel Fish takes advantage of the augmented reality provided by Snapchat's Spectacles to create the perfect space for immersion-based language learning. We begin by asking the user to select their target language. Currently, we support Spanish, French, Italian, German, Chinese (Mandarin), Japanese, Korean, Russian, Arabic, and English. Upon selection, we offer two modes: learn and play.

In Learn mode, the user can explore their surroundings. The Spectacles will label the item(s) in the user's view in the target language, thereby encouraging the user to draw an association between the item and the foreign word.

In Play mode, the user will enter a 30-second long scavenger hunt! A random common household item will be selected, and the word for it will be displayed in the target language. The user will be challenged to find that item nearby. After the 30 seconds are up, the user will see their score.

Overall, these two modes encourage users to interact with their environment while enhancing it with augmented reality. Play mode in particular encourages hyper-focused bite-sized interactions.

How we built it

We used Lens Studio to build a Snap Lens for Spectacles. In Learn mode, we used the Scan API to classify images within the user's field of view and display their labels. In Play mode, we used the Scan API to verify that the object the user picked up or approached is the item requested for the scavenger hunt. The modes can switch using voice commands, which we implemented using Snap's Speech to Text API.

Challenges we ran into

We initially planned to use the Snap iTranslate API to translate the English words for viewed items to words in the target language. However, we encountered a conflict between the iTranslate API and the Speech to Text API that allowed only one of the two to be used. We decided to create a custom solution for the translation issue by using Google Translate and our personal language knowledge to translate general household items into the target languages of Spanish, French, Italian, German, Chinese (Mandarin), Japanese, Korean, Russian, Arabic, and English.

Most of the existing Snap Lens Studio templates were built for orthographic (2D) filters, which were not directly compatible with the 3D view of the Spectacles. This required the manual reconstruction of API interfacing code found from several templates to account for the 3D view, as well as heavy reconfiguration of many scripts.

The Spectacles sometimes would not register our button presses or swipes along its side. It also tended to overheat quickly and needed a lot of time to cool down again, which made it hard to test with sometimes.

Accomplishments that we're proud of

Learning Lens Studio! None of us had experience building AR apps, or even any experience with Unity, Unreal Engine, or Lens Studio experience before this hackathon, and we learned a lot this weekend about how to interact with these cutting-edge 3D workspaces. We are so excited to continue building in this workspace in the future.

We are also very proud of creating a working minimum viable product! Going into the final night of the hackathon, we had a lot of bugs and were worried about potentially not being able to finish any code before the end of the hackathon. But we persevered through the entire night, spent a lot of time debugging, and finally got Learn mode to work! This victory spurred us to continue working and helped us focus on finishing the rest of the project.

What we learned

We learned how to use Lens Studio! None of us had experience building AR apps, or even any experience with Unity, Unreal Engine, or Lens Studio experience before this hackathon, and we learned a lot this weekend about how to interact with these cutting-edge 3D workspaces. We are so excited to continue building in this workspace in the future.

What's next for Babel Fish

We initially planned to allow the user to speak in the target language and fully interact with their Spectacles in the target language. We thought we could implement this with Snapchat's iTranslate API. However, this did not work out because Snapchat's Speech to Text Lens Studio feature only works for English. In addition, the Snapchat iTranslate API cannot currently connect with Snapchat's Speech to Text because of backend Snapchat code that we could not change.

One thing we really wanted to implement this weekend was Text to Speech so that the target word would be read to the user in both Learn and Play modes. Unfortunately, this feature also seemed to work only for English words.

We are excited for future updates from Snap that will allow us to better create the world we want to see!