Inspiration

As passionate researchers in ML for healthcare, we are inspired to create a new system for those who are visually impaired. Knowing that more than 600 million people suffer from various forms of visual impairments, we strive to build a tool for all by restoring the visual cognition of the people with real-time interactive vision repair agent.

What it does

Visioneers is a generalist visual assistant for those with visual impairment that fits its functionalities to the need of the users. We developed modalities for those with colorblindness, glaucoma, cataracts, and various level of blindness by integrating a real-time interactive program that detects and creates modular

How we built it

We employed multiple tools to incorporate our method.

Multimodal Guidance

Using the latest and most powerful model GPT4-o, we exploit the strong spatial reasoning and image understanding of the VLM and give users accurate guidance in a variety of tasks such as captioning the scene, step-by-step action guidance in unfamiliar environments, and retrieval of objects in cluttered environments.

Emotion recognition

We used Hume to detect the emotion of the users' speech. If the user is distressed or nervous due to various factors such as unfamiliar environments, we use Hume's recognition to make the assistant more understanding of the situation. Additionally, if the user is under a lot of stress, we will allow the assistant to wait until the user is done with their tasks.

Interactive Speech Interface

We used Azure to recognize the text of the user, and we also used openai's Whisper to render realistic speech from texts coming from the output of GPT4. Using Depth Estimation, we calculate objects lengths between users so we can fly out how close they are.

Challenges we ran into

Making a complete pipeline of various components is inherently challenging, and we found it quite difficult to generalize into a less urban environment with the current bandwidth.

Accomplishments that we're proud of

We're really proud of making an interactive tool for all in such a short period of time.

What's next for Visioneers

We would certainly like to expand the scope of the project: it's not only those who need visual aid who are in need of accessible agents, but those with mobility issues, hearing loss, and all forms of disabilities. We strive to create a truly generalist agent for all the accessible needs in the future.

Built With

Share this project:

Updates