Vision Pro Max

Before image capture
After image capture
Using Google Cloud & Gemini AI API

Inspiration

As a team composed of members with engineering backgrounds, our objective is to leverage robust and cutting-edge AI technology to improve the lives of individuals with visual disabilities. Taking inspiration from the recent release of Vision Pro, we aim to empower visually impaired individuals by providing them with the means to visualize and navigate the world around them. By harnessing advanced technology, such as visual inputs integrated with Google Gemini, we endeavor to enable blind individuals to perceive their surroundings akin to sighted individuals, thus facilitating their interpretation of the world and enhancing their independence and quality of life. This technology is also beneficial for less developed areas where traditional accessibility aids such as braille or crosswalk audio signals may not be readily available.

What it does

This project helps visually impaired individuals by enabling them to capture images of their surroundings through a web interface. Once the image is captured, the system provides an auditory description of the scene. Users can choose the level of detail in the description to suit their needs, ranging from basic outlines to detailed narratives involving colors and sizes. For instance, while navigating a familiar path with tactile paving, a simple description is enough. However, in unfamiliar areas or places without specialized pathways, more comprehensive details like road signs and building directories become essential for safe and effective navigation.

How we built it

The entire back end was coded in Python. We utilized Google Gemini Vision Pro API to describe the content of the image captured by the camera in text. Then, we used Google Cloud Text-to-Speech API to convert text into audio files. The dynamic user interface was built using HTML, CSS, and Javascript, while Flask and the Fetch API were used to handle HTTP requests. Pillow was also used to manage and process the images taken by the user.

Challenges we ran into

We struggled with the complexity of setting up and securely integrating Google Cloud APIs, facing issues like handling authentication and managing API keys. Also, merging independently developed code into a cohesive system was challenging due to differing coding styles and methodologies. Our team also had very little prior experience with web development which proved to be a challenge as we attempted to learn different frameworks within the limited time limit.

Accomplishments that we're proud of

Creating a product for a demographic we don't personally belong to, such as the visually impaired, presented a unique challenge. Our lack of direct experience made it difficult to authentically address their needs and safety concerns. Through brainstorming, we identified potential safety risks for users and developed an alarm feature to regularly check on the user's consciousness. Additionally, despite our team having nearly no prior experience in frontend design, we successfully integrated the user interface and its functionalities with the backend system, ensuring a seamless and effective operation.

What we learned

From this hackathon, our team acquired a diverse set of skills and insights. We successfully bridged the gap between frontend and backend development, mastering the synchronization process via Git and GitHub within Visual Studio Code. Delving into Genesis, we adapted the provided Python code, understanding its structure and functionality to fit our project's vision. Throughout the event, we encountered challenges that honed our problem-solving abilities, bringing us to online resources for guidance. This experience deepened our comprehension of code frameworks and structure, empowering us to apply these newfound skills to future endeavors.

What's next for Vision Pro Max

Our future plan involves processing video in real-time to better understand our users' behavior. This will allow us to detect user movement and the speed of their motion, enhancing our alarm system for added safety. For instance, if a user remains stationary for an extended period, the system could prompt them to touch the screen, confirming their safety and consciousness.