Inspiration

Our journey began while volunteering at local non-profit organizations that support children with speech impediments. We witnessed how therapy costs and limited access to qualified therapists often prevented families from giving their children the help they needed at the most critical stages of development.

Beyond language delays, these children face increased risks of behavioral and emotional challenges, stemming from factors such as maternal mental health, strained parent-child relationships, and developmental delays. Research shows that enhancing communication and social expression plays a key role in improving their overall well-being.

We were inspired to create an ethical, affordable, and engaging virtual companion - a tool that could extend therapy beyond the clinic. Our goal was to encourage children to explore their imagination through language, without the stigma or limitations that often come with traditional therapy settings.

What it does

Most virtual therapy bots focus on pronunciation issues like articulation disorders. We set out to solve a different challenge: helping children organize fragmented, jumpy speech into coherent dialogue. Our companion actively remembers context and responds in a way that encourages storytelling and exploratory expression. For example: Child: “I saw a bird… I want ice cream.” Companion: “The bird is flying high in the sky. What flavor of ice cream would you like?” By continuing and structuring the child’s thoughts, the bot supports both speech practice and cognitive organization.

Other features include: Visual storyboards: Gemini image generation creates dynamic panels that reflect the child’s words, fueling creativity. Friendly voice interaction: ElevenLabs provides a comforting, human-like voice to foster trust and warmth. Progress tracking: A lightweight SQLite database stores anonymized interaction data to show improvements over time. Parent-friendly insights: Provides constructive feedback to help families encourage their child’s communication journey.

How we built it

We combined speech-to-text (STT), natural language processing (NLP), and text-to-speech (TTS) to power real-time conversations.

  • Gemini API: Handles STT, TTS, and language generation.
  • Image Generation: Produces endless child-safe visuals to match stories.
  • FaceTime-style UI: Integrates captions, animations, and picture panels.
  • SQLite3 Database: Securely stores anonymized session data for progress tracking.

Our prompts were carefully designed to be patient, encouraging, and bias-free; allowing the child to lead conversations while the companion gently guides and extends their ideas.

Challenges we ran into

Being first-time hackathon participants and building a full-stack project was an ambitious leap. Some of the key hurdles included:

  • Ensuring real-time responsiveness - STT + TTS latency sometimes disrupted the conversational flow.
  • Designing child-appropriate, safe prompts with minimal bias and no harmful content.
  • Achieving reliable Gemini API calls that could handle fragmented speech inputs.
  • Generating engaging, non-repetitive images that aligned with children’s storytelling.
  • Managing ethical data practices, including parental consent and strict anonymity.

Despite these challenges, we persisted - learning to debug under pressure, delegate tasks effectively, and iterate quickly.

Accomplishments that we're proud of

  • Built an interactive prototype featuring live speech recognition, picture panels, and animated captions.
  • Created a proof-of-concept system that aligns with our vision of an unbiased, open-ended creative outlet for kids.
  • Successfully tackled complex front-end/back-end integration for the first time as a team.
  • Established a clear roadmap for scaling the companion into a therapy-focused, child-friendly platform.

What we learned

This project was a crash course in both technology and empathy:

  • Prompt engineering: We discovered that concise examples of ideal dialogues produced better responses than lengthy instructions.
  • Model selection: Different Gemini models varied in tone—some were overly formal or robotic, while others were more imaginative and kid-friendly.
  • Conversation design: Balancing structure (to guide speech) with flexibility (to spark creativity) is essential for therapeutic dialogue.
  • Collaboration: Learned version control, task delegation, and the nuances of building a full-stack product as a cohesive team.

What's next for

We envision expanding the companion into a reliable, safe, and fun platform that families can trust:

  • Improving speech recognition accuracy and reducing latency.
  • Adding mini-games and creative activities to make sessions more engaging.
  • Implementing authentication and progress dashboards for parents and therapists.
  • Refining the UI for a child-centric, playful experience.
  • Training on specialized therapy datasets to further enhance dialogue quality.

Finally...

We started this project with a shared belief: every child deserves a chance to express themselves fully, regardless of resources or circumstances. By blending accessible technology with empathy, we hope our virtual companion becomes not just a therapy tool, but also a trustworthy friend and creative partner in a child’s growth.

Built With

Share this project:

Updates