MedRover: Bridging the Language Gap in Emergency Care

Inspiration

Imagine being sick, really sick, and the one person who can help you simply cannot understand what you're saying. No shared words. No way to describe your pain, your symptoms, your fear. You watch them move on to the next patient.

This is the reality for millions of people in refugee camps, field clinics, and underserved communities around the world. Language shouldn't be a death sentence. We built MedRover because we believed it didn't have to be.

What We Built

MedRover is an autonomous ground vehicle that manages the entire patient-doctor workflow across language barriers. It detects a patient via camera, conducts a nurse-style intake interview in Spanish powered by GPT-4o, assigns a clinical severity score, and hands the doctor a clean English summary before they step in. During the consultation it relays speech in real time: doctor speaks English, patient hears Spanish, and vice versa. It can also read prescriptions via OCR and look up local medicine availability, spoken back to the patient in their language.

How We Built It

GPT-4o drives the nurse persona with dynamic follow-ups. Smallest.ai handles multilingual TTS and STT. FastAPI serves as the backend, MongoDB stores every session, and OpenCV handles face detection. We split into three parallel workstreams, hardware, AI pipeline, and frontend, with clean interfaces between each so the whole team could build and test independently.

The full speech pipeline for each patient turn follows:

$$\text{STT} \xrightarrow{\text{Smallest.ai}} \text{Translate} \xrightarrow{\text{GPT-4o}} \text{Translate} \xrightarrow{} \text{TTS}$$

Challenges We Faced

Stateful conversations broke on serverless infrastructure, forcing a move to Railway. Keeping GPT-4o's intake to $n \in [3, 5]$ turns without long context and still covering the main patient diagnostics questions required careful prompt engineering. Maintaining parallel English and Spanish logs in sync across every turn, and keeping the full pipeline latency $\Delta t \ll 1\text{s}$ per turn, pushed us hardest.

What We Learned

Prompt engineering for clinical contexts demands precision that general prompting doesn't. Separating concerns across hardware, AI, and API layers and delegating fully independent modules helped us build without many interdependencies.

Built With

Share this project:

Updates