Inspiration

As touchscreens and voice assistants take over, something quietly gets left behind: tactile literacy. Braille isn’t obsolete it’s personal, private, and empowering. But while technology has gotten very good at reading text aloud for visually impaired users, it’s surprisingly bad at helping them find the text in the first place.

That’s where the real frustration lives.

For a Braille reader, the challenge isn’t decoding the dots it’s locating the sign in a big, unfamiliar space. You’re standing there, knowing the information exists, but not knowing where your hand should go. We didn’t want to build something that reads for people. We wanted to build something that respects the act of reading itself.

That’s when the idea clicked: what if we built a kind of tactile GPS a system that guides your hand to the Braille, then gets out of the way?

What it does

See: A webcam scans the environment to detect braille signs via YOLO and braille models, and hand position via MediaPipe. Guide: It calculates the position of the sign relative to the user's hand. Feedback: It provides real-time guidance using voice commands (e.g., "Move left") to steer the user's hand directly to the braille text until the user can feel it.

How we built it

Started by optimizing the webcam's resolution to ensure the frame rate was high enough for real-time tracking without any noticeable lag. Our code then runs MediaPipe to map the hand's landmarks and a YOLO model to draw a bounding box around the Braille, essentially turning both into sets of 'x', 'y', 'z', coordinates on the screen. By calculating the distance and angle between these two points, the system triggers a logic loop that selects the right voice command like 'up' or 'left' to bridge the gap until the coordinates overlap

Challenges we ran into

Integration with braille and hand detection to work simultaneously, while computing their coordinate (distance):

Individually, creating the braille detection file and the hand detection file had a few difficulties that were easily solved. However, many hours went by to accurately integrate both of these detection files, drawing a line to bridge the gap between the fingers and the braille signature. Lastly, after getting the integration working, the actual computation and mapping of the coordinates accurately had several bugs.

Significant delay via Eleven labs:

Eleven labs cloud-based processing brought too much network latency, causing a desync where the audio feedback lagged significantly behind the real-time MediaPipe data. This causes the prompts to be spoken much after the user has moved to a new location making the system very inefficient.

Integration with TTS to guide users' hand to the braille signature:

Using a lighter TTS tool, pyttsx3, rather than Eleven Labs we found that it would often process the text but fail to actually voice any output. This challenge took almost as much time as the first one, where there were no foreseeable lines of code that would cause such problems to occur. After trial and error and switching between different agents, we were running short on time and ultimately adopted the manual TTS after X seconds rather than the smart TTS with many bugs.

Accomplishments that we're proud of

True Offline Capability: The entire system runs fully locally on-device, with no cloud API calls, zero network dependency, and complete data privacy. All perception, computation, and feedback happen in real time on the host machine.

The “Geiger” Audio Interface: Instead of intrusive voice prompts (“Left… Left… Stop”), we designed a variable-frequency audio feedback system inspired by a Geiger counter. The beeping rate dynamically increases as the user approaches the target, creating an intuitive, almost biological feedback loop that enables faster and more natural navigation.

Surviving the Merge: In the final hours, we successfully fused three independent subsystems computer vision, mathematical inference, and real-time audio feedback into a single pipeline without breaking concurrency or threading logic. Achieving stable synchronization under time pressure was one of the most technically challenging and rewarding parts of the project.

What we learned

-Learned how to manage computer resources by downscaling video resolution -Understood and became proficient with Yolo -Learned how to map a live video feed onto a coordinate system by extracting the ‘x’, ‘y’, ‘z’ coordinates from MediaPipe’s fingertip landmarks and the center of the YOLO detection box. -Discovered how to write a logic loop that monitors those changing coordinate numbers and triggers specific Text-to-Speech commands once the coordinates hit a certain threshold.

What's next for H.A.N.D.S

Our current prototype leverages a laptop’s processing power for rapid development and live demonstration. The next phase focuses on miniaturization porting our optimized ONNX models to embedded devices such as the Raspberry Pi 5. This transition will allow H.A.N.D.S. to be deployed as a fully wearable system, packaged into a chest-mounted camera or smart-glasses attachment, delivering true hands-free, real-world mobility. While audio feedback is effective, it can be intrusive in quiet or public environments. To address this, we plan to integrate a lightweight haptic feedback system using a vibration motor driven by an Arduino or ESP32, worn on the user’s finger or wrist. This transforms our audio-based “Geiger counter” into a Haptic Compass with vibration intensity increasing as the user approaches the target text enabling discreet, intuitive navigation without needing to rely on sound.

Built With

Share this project:

Updates