Inspiration
According to recent studies, approximately 1.5 billion people worldwide live with musculoskeletal conditions, with a significant portion affecting upper-body mobility. Conditions like arthritis, repetitive strain injuries, and neuromuscular disorders create substantial barriers to computer use, an increasingly essential tool for education, employment, and social connection. Our team was motivated to create steven.ai after witnessing members of our community struggle with conventional computer interfaces due to limited upper-body mobility. We recognized that existing accessibility solutions often fall short of providing truly intuitive and comprehensive support. Our ultimate goal is to help bridge the digital divide and ensure that technology remains accessible for everyone, regardless of physical limitations.
What it does
Steven.ai is an innovative accessibility tool designed to help people with musculoskeletal disabilities use computers more effectively. The software tracks eye and hand movements to recognize functional gestures, while also incorporating speech-to-text capabilities for command input. By combining these different interaction methods, steven.ai creates an adaptable interface that removes barriers to computer usage for individuals with limited mobility.
How we built it

Programming Language(s):
- Python
Object Tracking: *OpenCV
- MediaPipe
- PyAutoGUI
- numpy
Voice Commands:
- ElevenLabs
- Gemini
- PyAudio
- Speech-recognition
- pydub
- ffmpeg
Challenges we ran into
- Integrating ElevenLabs with Gemini to create voice-activated commands that would perform tasks on the computer
- Translating the coordinates of the eye tracker and hand tracker into coordinates for the mouse
- Emulating interactions using both the hand controls and the eye controls
Accomplishments that we're proud of
We successfully implemented a system that captures and interprets users' head and eye movements to control cursor navigation.
We developed an option for users to interact with the cursor by utilizing the position of their hands, with the left side built to move the cursor and the right side built to interact with the website.
We integrated speech recognition capabilities that allow users to execute commands and dictate text using their voice.
What we learned
- The complexity that comes with using computer vision to interact with the user's computer
- The evolution of speech-to-text technology through ElevenLab's API
What's next for Steven.ai
A crucial next step would be developing functionality that enables users to fill out forms and compose emails entirely through voice commands. This includes dictating content, navigating between fields, and submitting forms, streamlining tasks that are often cumbersome for users with limited mobility.
To provide a more comprehensive hands-free experience, we're working on integrating a broader range of voice commands. This will allow users to perform various actions such as opening applications, controlling system settings, and navigating the web using natural language.
We're aiming to implement context-sensitive features that adapt to the user's current activity, offering relevant suggestions and automating routine tasks to enhance efficiency and user experience.
Log in or sign up for Devpost to join the conversation.