GrandBuddy AI

This project, GrandBuddy AI bridges the digital divide by transforming how seniors interact with modern technology. By combining real-time screen awareness with compassionate, voice-first guidance, we provide a safety net for grandparents navigating a digital world that often leaves them behind.

Inspiration

Technology moves fast, but our elders shouldn’t have to. We watched our own grandparents struggle with "simple" tasks—setting up a video call, checking an online bank statement, or even just sending a photo—only to feel frustrated and isolated. Traditional tech support is often impatient or overly technical; we wanted to build something that feels like a patient grandchild sitting right beside them. Our goal is to empower a generation to reclaim their digital independence.

What it does

GrandBuddy is a contextual AI agent that "sees" what you see.

Screen Context: It records the user’s screen in real-time to understand exactly where they are stuck.
Voice Conversation: Powered by ElevenLabs, it provides ultra-realistic, soothing voice guidance so users don't have to navigate complex menus or type.
AI Vision: It uses Google Gemini to analyze screen frames, identifying buttons, login fields, or confusing UI elements to provide step-by-step instructions.

How we built it

We built the desktop application using Electron and Vite for a lightweight, cross-platform experience. The frontend is crafted with React and Tailwind CSS, focusing on high contrast and large, accessible typography.

For the "brain," we leveraged Google's Gemini 1.5 Pro (via the Vertex AI/Google AI SDK) to perform multimodal analysis on screen captures. To make the interaction feel human, we integrated ElevenLabs’ Conversational AI API, allowing for low-latency, emotionally resonant voice interaction. We used Bun as our runtime to ensure high-performance processing of the screen data and API overhead.

Challenges

Our biggest hurdle was latency and privacy. Analyzing screen data in real-time requires significant bandwidth and processing. We optimized the screen capture pipeline to send only essential context to the LLM, ensuring GrandBuddy responds in seconds rather than minutes. We also spent hours fine-tuning the AI's "persona" to ensure it remains encouraging and avoids technical jargon, focusing on simple, actionable steps.

What's next for GrandBuddy?

The elderly population is growing, and the digital gap is widening. We plan to implement "remote hand-holding," where family members can pre-record specific guides for GrandBuddy to follow. We also want to integrate with OS-level accessibility APIs to physically highlight parts of the screen the AI is talking about. GrandBuddy isn't just a tool; it's a bridge to keeping families connected in a digital age.