Inspiration
Over 25% of the human population has some sort of visual impairment, and for those unfortunate enough to suffer total vision loss, it can be incredibly difficult and dangerous just navigating daily life. While guide dogs are being increasingly accepted in many public spaces, these canine companions may be barred from accessing zoos, airports, hospitals, etc. ThirdEye is like a guide dog in your pocket, placing you just a voice command away from a vivid, narrated description of your surroundings in the current moment, or recalling a scene from weeks, months, or even years ago.
What it does
ThirdEye hooks onto your shirt pocket and requires only your voice to operate. With one spoken word, you can instantly capture the scene in front of you. Then, ThirdEye will play a detailed description, with an emphasis on obstacles or potential tripping hazards, and save that description to a cloud database. With another command, you can ask ThirdEye a question about what it has seen before, and it will tell you exactly where, when, and what it was.
How we built it
We built ThirdEye using Python on a Raspberry Pi 5. The user's speech, captured by a Bluetooth microphone, is processed by Python's SpeechRecognition library to tune out background noise and discern commands. If the user asks to take a snapshot, the Pi camera takes a photo which is then encoded and passed along with a customised prompt to Cohere, which returns a description that is read aloud with gTTS. This description is also saved to DynamoDB which triggers an AWS Lambda function which stores it in an OpenSearch vector database alongside its timestamp and location data. When the user asks ThirdEye to recall a certain scene by providing a few details, e.g. "Tell me where I saw a clock tower next to a river" it queries the OpenSearch database for possible candidates, which are passed to Cohere to select the most accurate entry. Finally, the description, time, and location are narrated back to the user.
Challenges we ran into
We had a lot of difficulty deciding on and acquiring the correct hardware. We initially tried to use an ESP-32 camera with a microphone and speaker module, but the high latency, lack of IO pins, and the provided microphone not being a microphone but instead a sound detector posed significant challenges. We experimented with an Arduino Uno, a regular ESP32, a QNX Raspberry Pi 4B before settling on our Pi 5 design. We found it challenging to navigate python dependencies between half of our team developing on MacOS, the others on Windows, and the deployment on a Linux-based PiOS. We also struggled with audio IO; the Bluetooth connections were extraordinarily finnicky so we ended up using a USB speaker for the demo. And last but certainly not least, as Minghao will testify, the little pushbutton was the bane of our existence for the most part of 5 hours. After it seemed to be finally working, it just decided to suddenly stop working after an innocuous git pull which didn't even change any relevant code. We eventually concluded that it was faulty after hours of fighting sunk cost fallacy. We initially intended to use this button to trigger snapshots or recalls, but we had to pivot deep into the project. Thankfully, we were able to adapt through trial and error, and accomplish our goal to build something cool and most importantly impactful at Hack the North 2025.
Accomplishments that we're proud of
- Getting the hardware working!
- Pivoting from using the button to voice commands
- Seamlessly integrating DynamoDB and OpenSearch
- Making full use of Cohere
- Managing multimodal IO
- Decreasing latency with multithreading
What we learned
- Hardware can be very unreliable (THE BUTTON)
- How your hardware behaves can almost seem non-deterministic
- The prompt design is crucial to getting consistent responses without hallucinations
What's next for ThirdEye
- Downsizing our hardware to make it more portable and all-in-one with a smaller microphone and speaker.
- More sensors for redundancy and extra safety such as lidar
Built With
- amazon-dynamodb
- cohere
- graphite
- gtts
- opensearch
- python
- raspberry-pi
- speechrecognition
- windsurf

Log in or sign up for Devpost to join the conversation.