steven.ai

Inspiration

According to recent studies, approximately 1.5 billion people worldwide live with musculoskeletal conditions, with a significant portion affecting upper-body mobility. Conditions like arthritis, repetitive strain injuries, and neuromuscular disorders create substantial barriers to computer use, an increasingly essential tool for education, employment, and social connection. Our team was motivated to create steven.ai after witnessing members of our community struggle with conventional computer interfaces due to limited upper-body mobility. We recognized that existing accessibility solutions often fall short of providing truly intuitive and comprehensive support. Our ultimate goal is to help bridge the digital divide and ensure that technology remains accessible for everyone, regardless of physical limitations.

What it does

Steven.ai is an innovative accessibility tool designed to help people with musculoskeletal disabilities use computers more effectively. The software tracks eye and hand movements to recognize functional gestures, while also incorporating speech-to-text capabilities for command input. By combining these different interaction methods, steven.ai creates an adaptable interface that removes barriers to computer usage for individuals with limited mobility.

How we built it

alt text

Programming Language(s):

Python

Object Tracking: *OpenCV

MediaPipe
PyAutoGUI
numpy

Voice Commands:

ElevenLabs
Gemini
PyAudio
Speech-recognition
pydub
ffmpeg

Challenges we ran into

Integrating ElevenLabs with Gemini to create voice-activated commands that would perform tasks on the computer
Translating the coordinates of the eye tracker and hand tracker into coordinates for the mouse
Emulating interactions using both the hand controls and the eye controls

Accomplishments that we're proud of

We successfully implemented a system that captures and interprets users' head and eye movements to control cursor navigation.
We developed an option for users to interact with the cursor by utilizing the position of their hands, with the left side built to move the cursor and the right side built to interact with the website.
We integrated speech recognition capabilities that allow users to execute commands and dictate text using their voice.

What we learned

The complexity that comes with using computer vision to interact with the user's computer
The evolution of speech-to-text technology through ElevenLab's API

What's next for Steven.ai

A crucial next step would be developing functionality that enables users to fill out forms and compose emails entirely through voice commands. This includes dictating content, navigating between fields, and submitting forms, streamlining tasks that are often cumbersome for users with limited mobility.
To provide a more comprehensive hands-free experience, we're working on integrating a broader range of voice commands. This will allow users to perform various actions such as opening applications, controlling system settings, and navigating the web using natural language.
We're aiming to implement context-sensitive features that adapt to the user's current activity, offering relevant suggestions and automating routine tasks to enhance efficiency and user experience.