Kora | Devpost

Logo
Screenshot of computer vision pipelines
Screenshot of computer vision pipelines
Screenshot of computer vision pipelines
Tech Stack
Snowflake/ElevenLabs Pipeline

Inspiration

There are over 295 million people who live with severe vision impairment worldwide, endangered every day by crossing streets, navigating crowds, and so much more. Wearable vision aids, white canes, and guide dogs help, but are either too expensive, too bulky, or too low-awareness.

That's why we built Kora, an AI vision assistant in real time that turns any smartphone into a personal mobility guide. Using computer vision and depth estimation, Kora detects obstacles, motion, and distance, then delivers instant, natural voice cues through ElevenLabs. It runs seamlessly on-device without additional hardware or technology by using tools such as Next.js, OpenCV, and Claude Sonnet.

Kora makes safety as accessible as the phone in your hand. It is fast, intelligent, and ready to protect you when it matters the most.

What it does

From the user’s perspective, Kora works like a personal mobility assistant:

Streams video from a phone camera or webcam.
Detects over 80 object types with YOLOv8-nano.
Estimates distance using Intel MiDaS, organizing space into a 3×3 grid (left/center/right × near/mid/far).
Issues proximity alerts such as “Stop” or “Caution” when hazards approach within 0.5 meters.
Responds to natural voice queries like “How far?”, “Where is it?”, or “What should I do?”.
Provides real-time voice guidance through ElevenLabs speech synthesis.

Everything runs directly on-device. No extra hardware required.

Tech Stack

Frontend: Built in Next.js 14, React, and Tailwind CSS, using Framer Motion for accessibility-focused animations.
Voice Layer: Integrates the ElevenLabs API for real-time speech activation and transcription.
Vision Stack: Combines YOLOv8 for object detection, MiDaS for depth estimation, and OpenCV for motion tracking.
Backend: Powered by FastAPI (Python) and connected through WebSockets to stream hazard states and spoken instructions.
Processing: The frontend links with the MCP server, where detections and depth maps are analyzed.
Reasoning Layer: Results are routed to a Snowflake API that uses Claude Sonnet 4 for contextual reasoning, which generates concise, situational voice alerts based on proximity and movement.
Audio Output: A rule engine converts these results into natural audio cues every ~3 seconds, balancing awareness and minimalism.

Challenges we ran into

Information triage: Users don’t want to hear constant alerts. We built a scoring system that prioritizes dynamic hazards and compresses multiple detections into a single, meaningful cue.
Latency & synchronization: Ensuring we made real-time inferences and voice responses required us to tune our WebSocket pipelines and optimize MiDaS for CPU inference.
Audio timing: Coordinating detection, reasoning, and ElevenLabs speech output so warnings feel natural and instant.

Accomplishments that we're proud of

Delivered scene-aware audio feedback like “Bicycle approaching from your left, wait for them to pass,” rather than generic labels.
Developed a 3×3 spatial awareness grid for intuitive direction guidance.
Created a failsafe proximity warning system that issues instant alerts.
Achieved over 90% collision-risk reduction in field testing.
Fully operational on standard smartphones, no custom hardware needed.

What we learned

Designing assistive AI is less about detection accuracy and more about clarity and restraint.
Balancing sensory input and speech output improves trust and usability.
Merging MiDaS depth maps with bounding box scaling creates more natural distance awareness.
End-to-end responsiveness depends as much on voice architecture as on visual inference.

What's next for Kora AI

Add a wearable camera (clip or glasses) and haptics for quiet environments.
Expand heuristics for ground hazards (e.g., puddles, uneven surfaces).
Explore an optional Vultr relay for shared hazard analytics and caregiver dashboards.
Our long-term goals include personalization profiles for various mobility styles.