Navigating urban spaces can be dangerous, especially for visually impaired individuals or in low-visibility environments. Our team wanted to create a real-time AI assistant that not only sees but also speaks, identifying hazards, traffic signals, and obstacles to help people move safely and confidently.
We were inspired by how autonomous vehicles use perception systems and wondered: “What if that same intelligence could fit in your pocket, accessible to everyone?”
Challenges We Faced: Integrating multiple heavy AI models (YOLOv8 + DPT + LLM + TTS) in real time without GPU acceleration.
Managing API concurrency limits (e.g., ElevenLabs 429 “Too Many Requests” issues).
Handling asynchronous inference and avoiding request pileups
Balancing speed vs. accuracy for depth estimation while maintaining sub-second latency.
Ensuring audio alerts remain concise and contextually relevant even in noisy environments.
Built With
- claude
- dpt
- elevenlabs
- gemini
- huggingface
- javascript
- midas
- react
- yolov8
Log in or sign up for Devpost to join the conversation.