depict.

Inspiration

The majority of our team members suffer from vision loss. Because of this, we often encounter distinct difficulties with navigating our environment without our corrective lenses. Subsequently, we began to wonder how difficult it must be for individuals with visual impairments more complex than ours. A common method individuals use is zooming in on distant objects or text on their phones. We wanted to build on this concept by streamlining the experience to aid their day-to-day life using the devices in everyone's pockets.

What it does

depict. generates text using AI and voice synthesizers to aid the visually impaired in identifying objects in their surrounding environment. It allows the user to either upload or take a photo of an object of interest, and depict. will read out a description of the image.

How we built it

We focused on creating a cohesive prototype that would facilitate the programming phase later on. So to achieve this goal, we tried to use a framework that would allow easy mobility and interpretability, thus we decided to go ahead with Figma. We started by designing the main functionality of the app, including the home page, text for a specific image, and audio for that text. After designing a set of basic screens, we started focusing on how it would work in an iOS environment by introducing various transitions and elements. Then we added polishing details to the prototype to ensure it would be in a presentable state.

Challenges we ran into

The first challenge we encountered was we lacked the knowledge to design screens and prototypes. As stated above, we chose to go ahead with Figma, but we were unsure how to fully utilize Figma. To solve this problem, we allocated time at the beginning of the hackathon to learn the framework and understand the ins and outs of prototype design. The second, and considerably larger, challenge was inserting and interacting with audio files in our prototype. For this step, we first needed to create the TTS audio file, and then insert it into our prototype so it played in the correct sequence. However, we were unable to add it in due to our team settings. After troubleshooting and modifying our settings, we were able to reach a stage where we were ready to insert the audio file. Unfortunately, we encountered yet another issue with a button that would allow us to mute/unmute the audio. This is when we learned about variants in Figma, and how icons can change states when interacted with. We tried to implement what we had learned but figured out the functionality between variants and how the buttons behaved in our designs weren't compatible. So we switched over to an alternative button design that we realized later on was also more accessible for the visually impaired. While lots of time was wasted, going down this rabbit hole allowed us to not only gain invaluable experience and learn new concepts but also create applications tailored to our target audience.

Accomplishments that we're proud of

We are proud that we were able to explore this niche topic that is relevant to our own lives and develop a well-rounded experience for the visually impaired. We are also proud of our perseverance in overcoming all of the challenges we faced, no matter how frustrating they were or how late it was at night.

What we learned

We are proud that in the process of designing our prototype, we were able to reach an advanced level of understanding of the Figma framework. During our time designing, by making smart, informed decisions, we were able to save time and still introduce a pleasing and engaging user experience. We also learned how to implement various features such as audio files, variants, buttons, and screen interactions that will be useful for future projects and designs.

What's next for depict.

depict. still has a bright future ahead of it. It is currently in the early stages of its development, meaning that the interface is still being actively iterated on. Once the design has been finalized, we would utilize GPT-4 and Speechify to produce generative text and convert text-to-speech (TTS) respectively. On the front end of the application, we would have utilized React.js to develop the elements created in our design. On the back end, the functionalities would be handled via Python.

On a broader scale, we hope to add a variety of new functionalities to depict. The first function we would like to add is the ability to highlight specific objects/sections of the images. In doing so, the user would be able to generate more specific descriptions of their desired pieces. Furthermore, we have a slew of other functions in mind, including customizable TTS voices and color themes to make the application more accessible to different types of color blindness.