cofounder.ai project creation
Input Audio
User Registration
Story History
Landing Page

Lirra: The Empathetic Storytelling Buddy for Kids 🧸

Inspiration 🌱

Lirra was inspired by the emotional challenges children face when coping with stress, anxiety, or medical difficulties. We wanted to create a gentle, interactive way for kids to understand and express their feelings through storytelling — turning therapy into a safe, creative, and comforting experience.

Unlike existing AI story apps that often lack emotional depth or show bias, Lirra was designed as a responsible, empathetic AI companion that listens, adapts, and connects with each child on a deeply human level.

What It Does ✨

Lirra listens and connects with each child through a multimodal AI pipeline that blends speech, text, and emotional intelligence. Using ElevenLabs Speech-to-Text (STT), the child’s spoken input is first transcribed into text. Both the voice signal and the transcribed text are then processed through two complementary emotion recognition networks: a Bidirectional LSTM for text-based emotion understanding, which captures contextual and linguistic cues, and a Parallel 2D CNN with a Transformer Encoder for voice-based emotion analysis, which learns tone, pitch, and rhythm patterns to interpret emotional states.

The detected emotions are fused with the child’s age, gender, and preference data securely stored in the database, forming a comprehensive emotional-context profile. This profile then guides LLM-based story generation powered by Anthropic Claude or OpenAI GPT-4, which crafts personalized therapeutic stories designed to comfort, motivate, or calm the child depending on their emotional state. Each story is paired with comic-style visuals generated by the Gemini-2.5 Flash Image, transforming the narrative into an engaging visual experience. Finally, Lirra narrates the story using Fish Audio TTS, which recreates the parent’s cloned voice to deliver the story in a warm, familiar tone making the experience emotionally resonant, safe, and deeply personal.

Workflow

Child Speech → ElevenLabs STT → Text + Voice Emotion Detection → Child Profile Retrieval → LLM Story Generation → Gemini Images → Fish Audio Narration

How We Built It 🧠

Lirra combines advanced AI architectures, multimodal emotion recognition, and creative storytelling to deliver a personalized therapeutic experience for children. The system is designed to integrate voice, text, and child profile data into adaptive, emotionally resonant narratives.

🖥️ Frontend Built with Next.js (React) for a fast, responsive web interface.

💬 Speech & Text Processing Speech-to-Text (STT): ElevenLabs STT transcribes the child’s spoken input into text. Text-based Emotion Detection: Bidirectional LSTM (BiLSTM) captures contextual and linguistic cues to identify subtle emotional undertones. Voice-based Emotion Detection: Parallel 2D CNN + Transformer Encoder analyzes the spectrogram of the voice, extracting pitch, tone, and rhythm patterns to infer emotional states.

🧩 Data Integration Child’s age, gender, and personal preferences are securely stored in a superbase SQL database. Emotional cues from text and voice are combined with the child’s profile to form a comprehensive emotional-context representation for personalized storytelling.

📖 Story Generation Stories are generated with LLMs (Anthropic Claude or OpenAI GPT-4), dynamically adapting the narrative, tone, and pacing based on the child’s emotional state and profile.

🎨 Illustrations Gemini Image Generator produces comic-style visuals for each story scene, enhancing engagement and emotional connection.

🔊 Story Narration Fish Audio TTS delivers the story in the parent’s cloned voice, providing warmth, familiarity, and emotional comfort.

Challenges We Faced 🧩

Integrating multiple deep learning modules with LLM.
STT, text & voice emotion models,
Text to Image generation
Integration of backend and front end without having too much latency.

Accomplishments We’re Proud Of 🏆

Built an end-to-end AI platform combining emotion recognition, adaptive storytelling, comic-style visuals, and custom voice narration.
Successfully implemented emotion-driven story personalization using BiLSTM and CNN-Transformer architectures along with children's personality traits extracted from the database.
Enabled child profile integration (age, gender, preferences) for tailored stories.
Being able to use Arnold Schwarzenegger's voice for storytelling (which can be integrated with parent voice cloning for warmth, familiarity, and emotional comfort.)

What We Learned 📚

How to effectively fuse multimodal AI (voice + text) with LLM storytelling for therapeutic applications. Deepened our understanding of emotion recognition architectures, voice processing, and narrative generation.

What’s Next for Lirra 🚀

Expand multilingual and culturally adaptive storytelling to reach children globally. Integrate additional therapy modules such as mindfulness, journaling, and guided breathing exercises. Deploy on-device emotion recognition models for enhanced privacy and offline functionality. Collaborate with child psychologists, therapists, and pediatric hospitals for clinical validation. Launch mobile apps and AR/VR immersive storytelling experiences to deepen engagement.