GenieSense

Inspiration

At a school symposium, a blind speaker communicated their regret for being unable to know what was right in front of them anymore. We wanted to make an affordable solution to offer a chance for the blind and visually-impaired to get a sense of their surroundings using the latest AI technology.

What it does

GenieSense is a device to narrate the environment to a blind person. By pointing the device at something for a few seconds, the device will describe the scene using a synthesized voice. It can describe any object or scene and can do text-processing tasks with an alteration of the software. It also supports 7+ languages and regional accents for a diverse consumer base.

Our software is open source for now, allowing anyone to create the device on their own. We have a website as well that allows users to upload images and generate mp3 descriptions of that image. There is also an explanation of how to best use the GenieSense device to experience your surroundings.

How we built it

We used a raspberry pi and custom 3d-printed enclosure with a webcam as the hardware for the device. The website was built using react and flask for the backend. To caption the image, we used transformers with gpt2. We then utilized the google text to speech api to generate mp3 files.

Challenges we ran into

This was the first time our team used hardware components during a hackathon, so it was difficult integrating them together to form a viable product. Printing a suitable case that could house the components with precision in the time limit required careful engineering.

Also, creating fast but reliable image captioning software to read the images into text via the flask backend was difficult. We had to experiment with multiple encoder decoder frameworks before settling on the transformers with gpt2 model.

Accomplishments that we're proud of

We are proud of the vast capabilities of the GenieSense device and the value it can bring to the visually-impaired and the blind. We’re happy it turned out well under the time constraint.

What we learned

We learned about the value of CAD in component design, intricacy in hardware design for affordability, and aiming a product to a targeted market. We also developed the skills to apply the latest AI image captioning technology and multilingual capabilities for future social projects.

What's next for GenieSense

To better suit our market, we plan to add an input device for controlled audio. Additionally, as we will not operate under a time constraint, we can use a cheaper, custom designed PCB with an integrated webcam, and even with a small speaker and other features, the cost will go from ~$60 to ~$47 with many more capabilities, much more affordable than devices on the current market which typically range between 200 and 2000 dollars.

Built With

Submitted to

MEGA Hackathon 2023
- Winner 1st
- Winner Robotics/Hardware Category
BeeLikeCoders 2.0

Created by

I did the image captioning and OCR integration, and the audio generation in the flask backend

Siddharth Kancharapu
I designed the hardware and did the frontend website and backend communication

Alexander Popescu