Inspiration

VISUAL911 was inspired by how stressful and chaotic emergency situations can be, especially when a caller can't clearly describe what's happening or where they are. We wanted to reduce misunderstandings and speed up response times by making it easier to show critical information instead of relying only on voice descriptions. We were also motivated by real-world accessibility needs: language barriers, hearing or speech impairments, panic, and shock, situations where a traditional phone call isn't always the best interface.

Beyond the individual caller, we noticed that emergencies don't happen in isolation. Two people can be calling about the same incident from opposite sides of a building, and the current 911 system treats them as completely separate events. We wanted to change that.

What it does

A caller taps SOS on their iPhone. Simultaneously:

  1. Runs a 15-second contactless vitals scan; heart rate and breathing rate from the front camera alone (Presage SmartSpectra SDK, no wearable needed)
  2. Opens a live WebRTC video call directly to the dispatcher's browser
  3. Streams call audio to Gemini AI for real-time triage analysis every 10 seconds
  4. Sends GPS coordinates and vitals to the dispatcher dashboard
  5. Broadcasts a community alert to all nearby VISUAL911 users showing incident location, severity, and corroborating report count

The dispatcher sees: live video, biometric readings, an AI-generated situation summary with severity score, a location pin, and a real-time count of corroborating reports, all in one screen.

Nearby community members see: an alert banner on their phone with incident count, report tally, and severity level.

Track alignment - this project directly addresses all four criteria:

  • Broadcasting alerts: Every SOS press immediately broadcasts a community_alert over a persistent WebSocket channel (/ws/alerts) to all idle VISUAL911 devices in the area.
  • Cross-checking submissions: When multiple SOS calls arrive, the server computes the haversine distance between them. Calls within 50 meters cluster into the same incident, with report_count incrementing per corroborating submission.
  • # users alerted per situation: alerted_count is tracked per incident and displayed live on the dispatcher dashboard under "Users Alerted."
  • # alerts raised per event: report_count per incident is shown as a badge on the Leaflet map pin and in the dashboard metrics bar, updating in real time.

How we built it

  • iOS app: Swift/SwiftUI, Presage SmartSpectra SDK, stasel/WebRTC, AVAudioEngine, CoreLocation
  • Backend: Python 3.11+, asyncio, aiohttp, google-genai, python-dotenv (single-process, no microservices)
  • AI: Gemini 2.5 Flash via batch generateContent with cumulative context across analysis rounds
  • Infrastructure: Vultr Ubuntu VPS, coturn TURN server, Let's Encrypt SSL
  • Dispatcher UI: Plain HTML/CSS/JS, Leaflet.js, WebRTC browser API

Challenges we ran into

Camera conflict between Presage and WebRTC. Both the Presage SmartSpectra SDK and the WebRTC video capturer need exclusive access to AVCaptureSession on the front camera. Running two sessions on the same camera causes a hard crash (error -12785). We solved this by time-slicing: Presage runs for up to 15 seconds at SOS press for the pre-call vitals scan, then hands the camera to WebRTC for the live video call. This turned a crash into a feature where dispatchers get biometric context the moment the call connects.

GPS accuracy indoors. iOS indoor GPS can jitter 30-100 meters depending on the building. A naive 500-meter clustering radius would group unrelated incidents at any urban venue. We tuned the haversine clustering radius to 50 meters, tight enough to differentiate distinct events while still reliably clustering two phones standing next to each other reporting the same thing. The radius is a configurable constant for different deployment environments.

Gemini Live API session limits. The Gemini Live API has a 15-minute session limit in audio-only mode and a 2-minute limit in video+audio mode, both too short for a real emergency call. We switched to batch generateContent every 10 seconds, wrapping accumulated PCM audio in a WAV container and including the latest JPEG video frame. To maintain continuity without a persistent session, each analysis round passes the situation_summary from the previous round as context, so Gemini builds a cumulative picture across rounds instead of re-analyzing from scratch.

Vitals timing race condition. Vitals from the Presage scan can arrive at the server before the call_initiated message registers the call, especially on slow connections. We added a pending_vitals buffer that holds early vitals by call_id and replays them once the call is registered, ensuring no reading is lost regardless of message ordering.

Accomplishments that we're proud of

  • End-to-end community alert pipeline: SOS press → server incident clustering → WebSocket broadcast to all idle subscribers → per-incident alerted_count and report_count tracked and shown live on the dispatcher dashboard, fully functional.
  • Contactless vitals with clinical-grade accuracy: Presage SmartSpectra achieves <1.62% RMSD for heart rate versus hospital equipment. We integrated it into an iOS app with real-time SDK feedback (face position, lighting quality, signal stability) and a scan UI that finishes early once signals stabilize rather than always running the full 15 seconds.
  • Live WebRTC P2P video deployed on real infrastructure: WebRTC peer-to-peer video working over coturn TURN on a Vultr VPS with Let's Encrypt SSL, not just localhost.
  • AI triage that works on silence: Gemini detects can_speak: false, classifies emotional state as panicked or unresponsive, and recommends a response type even when the caller never says a word. The cumulative context mechanism means each analysis round is more accurate than the last.

What we learned

  • Hardware constraints shape architecture: The camera conflict forced us to time-slice instead of running concurrently. That constraint turned into a better product: the pre-call scan gives dispatchers immediate biometric context that live vitals couldn't reliably provide during a call.
  • "Real-time" has a cost: The Gemini Live API seemed like the obvious choice until we hit the 15-minute session limit and audio reliability issues. Batch analysis every 10 seconds with carried-forward context is more reliable, cheaper, and produces better results than a persistent streaming session.
  • The community layer changes the problem: We started building a 1:1 emergency call system. Adding the broadcast layer changed the fundamental value proposition: every phone becomes a sensor, and corroborating reports change dispatcher confidence, not just raw information volume.
  • GPS is hard indoors: Indoor location accuracy is much worse than you expect. Any feature depending on proximity needs tunable thresholds and a fallback strategy.

What's next for VISUAL911

  • Richer community context: Currently, the alert includes location, severity, and report count. Future versions could include the type of incident (medical, fire, police) and whether help is already dispatched, so community members know whether to call themselves or stand by.
  • Responder-side integration: The dispatcher dashboard is a browser prototype. The next step is integration with actual CAD (computer-aided dispatch) systems used by 911 centers.
  • Stronger abuse prevention: False reports and location spoofing are real risks. Rate limiting per device, community reputation signals, and dispatcher review queues are all mitigation paths to explore.
  • Low-bandwidth mode: In areas with poor connectivity, the WebRTC video could degrade gracefully to audio-only while continuing to send vitals and GPS — the most critical data for dispatch decisions.
  • Multi-language support: Emergency callers often can't communicate in English. Gemini's multilingual capability could support triage across languages with minimal changes to the analysis pipeline.

Built With

Share this project:

Updates