amberAI

Inspiration

Most generative AI tools require communication by text, usually in English, but what if a user wants to verbally communicate with an AI or talk in a different language? They can use Amber -- a custom AI tool which sees your environment and responds to any queries you have in a select language of your choice.

We originally wanted to make a project helping the visually impaired. Halfway through, we realized that our idea had widespread potential, and could help almost everyone -- from the blind person unable to read ChatGPT's output to the average Joe simply looking to optimize a few minutes of their time.

What it does

When a user is talking with Amber, the software listens to the user's query while capturing their camera outputs and answers any questions they might have. Users can ask anything, from the sugar quantity in a Red Bull to the text written on their notebook, and Amber will be able to seamlessly reply within five seconds. Multilingual end-to-end communication is supported in 12 languages including English, Spanish, Mandarin, Hindi, Arabic, etc., meaning you can talk to the AI in your choice of language and it will respond in the same dialect.

How we built it

When the user presses shift+V, we start logging their speech. When they let go, their speech and a picture of their camera are both sent to our custom API, which converts this speech + image into a text description via multimodal fusion. This description is then played in the browser.

Technologies: the frontend is in React and Tailwind, and user accounts are stored in Firebase. Users can log in through Google OAuth. Our custom API is written in Python and utilizes Flask to accept requests; we also use the OpenAI and ElevenLabs APIs.

Challenges we ran into

The original plan was to get camera frames from an external VR headset, but after 14 hours of unsuccessful grappling with WebXR, we decided to pivot at 2AM. One of our teammates drove to his house at night in order to pick up a Raspberry Pi, and we unsuccessfully spent the next few hours trying to boot that up (who knew it couldn't handle pip install opencv-python ??) In the end, we had to settle for image logging via laptop webcam.

Accomplishments that we're proud of

We're especially proud of being able to develop and integrate a custom frontend, backend, and API from scratch within a day. Being able to pivot so quickly also felt satisfying -- within an hour, we went from no progress to having a functional prototype. Most importantly, however, we're proud of discovering the future potential of this project.

What we learned

One of our teammates was new to React and Tailwind, and he was able to familiarize himself with these technologies during the event. The rest of us learned how to interface with the OpenAI and ElevenLabs APIs, how to create our own API with Flask & deploy it to Render, and how to implement speech translation in the browser.

What's next for amberAI

We'd love to hook up the app's the camera display to an external VR in order to let the user observe the world around them while communicating with Amber. When a user wants to communicate with Amber, they'd simply hold a button on their VR and start talking; after they let go, Amber will respond to the user's query via the VR's built-in speaker.