Commute-Zen

Home Page
History Page
News Agent At Work

Inspiration

The modern commute is often filled with dead time: sitting in traffic, waiting for transit, or riding the bus. Yet people are increasingly hungry for knowledge and staying informed about current events.

The inspiration for Commute Zen came from a simple observation: what if we could transform that commute time into a personalized news experience?

We envisioned a solution that would not require users to read news articles or constantly check their phones while driving. Instead, we wanted to create something calm, hands free, and conversational. It should feel like having a personal news anchor briefing you during your commute.

Combined with the recent advancement of real time AI through the Gemini Live API, we saw an opportunity to build something truly innovative: a voice first news assistant that understands your interests and delivers exactly what you need to hear.

What it does

Commute Zen is a sophisticated voice first news assistant that generates personalized, podcast style audio summaries of current news articles tailored to your interests.

Core Features

Voice First Interaction
Start a live voice conversation with Gemini to tell the app about your preferred news topics such as technology, sports, politics, or entertainment.

Intelligent News Fetching
The app uses Gemini's search capabilities to fetch real time, topic specific news from across the web.

Smart Summarization
AI generates concise, podcast style transcripts optimized for commute listening, typically five to ten minutes.

Natural Audio Conversion
Text to speech technology converts summaries into natural sounding audio with proper pacing and emphasis.

Persistent History
Your generated summaries and audio are saved in the cloud using Firestore when you are signed in, or locally using IndexedDB when browsing anonymously.

Custom Audio Player
Stream and listen to your briefings with an elegant in app player.

User Journey

User taps the microphone button
Gemini Live initiates a conversation asking for preferred news topics
The app fetches current news matching those topics through Gemini with Google Search integration
AI creates a calm, engaging transcript suitable for voice playback
The transcript is converted to audio and played immediately
Everything is saved to the user's history for future reference

How we built it

Commute Zen is built with a modern, production ready tech stack that leverages Google's latest AI capabilities.

Frontend Architecture

Next.js 15 with App Router for server side rendering and optimal performance.
React 19 for component based UI and state management.
TypeScript for type safety across the entire codebase.

AI and Language Models

Google Gemini Live API using the @google/genai SDK for real time conversational AI.
Gemini models for multi step processing including search, summarization, and text to speech.
Google Search integration as a tool function for fetching live news data.

Backend and Data

Firebase Authentication for secure user authentication with Google Sign In.
Firestore as the cloud database for storing user profiles, summaries, transcripts, and chunked audio.
Custom Firestore security rules to enforce authentication and owner only access.

Storage and Persistence

Cloud storage uses Firestore when authenticated.
Local fallback uses IndexedDB with the idb-keyval wrapper for anonymous users.
Large audio files are stored as base64 encoded chunks to improve reliability.

Styling and UX

Tailwind CSS v4 for modern, responsive design.
Framer Motion for smooth animations and transitions.
Responsive design that works from mobile to desktop.

Infrastructure

Vercel for deployment and hosting.
Firebase App Hosting configuration for backend scaling.
Environment based configuration for API keys and app URLs.

Challenges we ran into

Building Commute Zen presented several technical and design challenges.

Real Time Voice Streaming
Getting Gemini Live to work smoothly in a browser context required careful audio capture, buffer management, and handling browser permission quirks across different devices.

Audio Format Compatibility
Converting Gemini text to speech output to multiple audio formats while maintaining quality and managing file sizes was difficult. Large audio files also had to be chunked for reliable Firestore storage.

Tool Function Integration
Integrating Google Search as a tool function within the Gemini Live conversation required careful prompt engineering so the AI would use it correctly and at the right moments.

Latency Optimization
Fetching fresh news, summarizing it, generating audio, and playing it back all needed to happen quickly to keep the user experience fluid. Network requests and API calls had to be optimized.

Authentication Flow
Managing both authenticated storage with Firestore and anonymous storage with IndexedDB required careful architecture to ensure data persistence without forcing users to sign in immediately.

Firestore Security
Writing comprehensive security rules that protect user data while allowing necessary operations was complex, especially with the nested audio chunk storage structure.

Error Handling
Creating a robust error boundary that surfaces meaningful diagnostics when Firestore permissions fail, API rate limits are hit, or microphone access is denied.

Mobile Responsiveness
Ensuring the voice interface, player controls, and history UI work smoothly on both large screens and small mobile displays.

Accomplishments that we're proud of

We are genuinely excited about what we built.

Seamless Voice First Experience
We created a truly hands free interface that feels natural and conversational. Users can get a complete news briefing without ever looking at text.

Multi Domain Intelligence
The AI adapts dynamically to different topic domains such as tech, sports, politics, and entertainment, and adjusts the tone and depth of coverage accordingly.

Real Time News Integration
Unlike static news apps, Commute Zen fetches and summarizes current news every time, ensuring users always get the latest information.

Robust Data Architecture
We designed a sophisticated data model that handles large audio files through chunking, implements strict security rules, and gracefully falls back from cloud storage to local storage.

Production Ready Code
The codebase is clean, type safe, well structured, and follows modern React and Next.js best practices. It is ready to scale.

Error Resilience
Our custom Error Boundary catches runtime and Firestore errors and surfaces them to users in a helpful way instead of failing silently.

Multi Modal AI
We successfully orchestrated multiple Gemini APIs including Live conversation, search, summarization, and text to speech into a cohesive user experience.

What we learned

This project deepened our understanding of several critical areas.

Real Time AI is Powerful
Gemini Live enabled conversational flow that would be difficult with traditional request response APIs. The latency is low enough for natural interaction.

Audio Processing is Complex
Managing audio capture, chunking, encoding, and playback requires careful buffer management and format handling.

Firebase Scales Elegantly
Firestore real time capabilities and security rules made it straightforward to build a multi user application with strong data isolation.

Tool Functions Change How AI Works
Giving AI models access to search tools makes it possible to ground responses in real and current data rather than relying only on training data.

User Experience Matters More Than Features
A simple and elegant voice interface is often better than a feature rich text based one. Constraints force better design.

TypeScript Prevents Bugs
Type safety across the project caught many errors before runtime and made refactoring safer.

Mobile First Thinking
Building a voice interface for mobile reinforced that not every interface needs a screen. Sometimes a button and a speaker are enough.

Graceful Degradation
Supporting both cloud and local storage ensures that the app still provides value whether users sign in or not.

What's next for Commute Zen

We have an exciting roadmap ahead.

Multi Voice Options
Allow users to choose different voice personalities and accents for their briefings.

Topic Customization Dashboard
A visual interface where users can save favorite topic combinations such as "Tech and Science" or "Global Business".

Briefing Length Control
Users will be able to choose their preferred briefing length such as five, ten, or fifteen minutes based on their commute.

Sharing and Social Features
Allow users to share their favorite briefings or generate briefings for specific situations such as before meetings or flights.

Advanced Analytics
Track which topics users engage with most, when they listen, and how their preferences evolve.

Offline Mode
Download briefings while connected to WiFi so they can be played during commutes without internet.

Third Party Integrations
Calendar integration to generate briefings related to upcoming events, along with weather and traffic updates for contextual commute information.

Multi Language Support
Generate briefings in multiple languages for a global audience.

Podcast Export
Allow users to export their briefings as podcast style episodes that can be synced to their phones.

AI Personalization
Over time the AI learns a user's communication style and news preferences, automatically tailoring tone and content selection.