A Progressive Web App that provides real-time, audible descriptions of surroundings for visually impaired individuals, now enhanced with Google Gemini's advanced native audio capabilities.
- Tap-to-Describe: Tap anywhere on the screen to capture and analyze what the camera sees
- Advanced AI TTS: Powered by Gemini 2.5's native audio capabilities with controllable voice styles
- Real-time Camera Feed: Full-screen camera interface optimized for mobile devices
- Intelligent Audio Feedback: Advanced TTS with natural expressivity and prosody
- Voice Control Options: Multiple voice styles, speeds, and customizable delivery
- Smart Fallback: Automatically falls back to Web Speech API when Gemini TTS is unavailable
- Visual + Audio: Both see and hear descriptions with history tracking
- Offline-Ready: PWA with service worker for app installation and caching
- Accessibility: Designed with screen readers and keyboard navigation in mind
- Frontend: Next.js 15, React, TypeScript, Tailwind CSS
- AI Vision: Google Gemini API (
gemini-2.5-flash) - Advanced TTS: Gemini 2.5 Native Audio (with Web Speech API fallback)
- PWA: Service worker for offline functionality
Based on Google's latest Gemini 2.5 native audio features:
- Natural Conversation: High-quality voice interactions with appropriate expressivity
- Style Control: Adaptable delivery with specific tones, accents, and expressions
- Enhanced Pace Control: Precise control over delivery speed and pronunciation
- Dynamic Performance: Expressive readings optimized for accessibility
- Multilinguality: Support for 24+ languages
- Voice Styles: Default, Calm, Warm, Professional
- Speed Options: Slow, Normal, Fast
- Automatic Fallback: Seamlessly switches to browser TTS if needed
- Smart Detection: Automatically detects Gemini TTS availability
-
Install Dependencies
npm install
-
Configure API Key
- Get a Google AI API key from Google AI Studio
- Create a
.env.localfile in the root directory:GOOGLE_API_KEY=your_api_key_here
-
Run Development Server
npm run dev
-
Access the App
- Open http://localhost:3000
- Allow camera permissions when prompted
- Tap anywhere on the screen to capture and describe
- Camera Access: Grant camera permissions when prompted
- Voice Settings: Tap the settings icon (top-left) to configure TTS preferences
- Tap to Capture: Tap anywhere on the screen to take a photo
- Listen & See: The app will both speak and display descriptions
- Repeat: Use the "Repeat" button to hear descriptions again
- History: View previous descriptions with the history button
- Install as PWA: Use your browser's "Add to Home Screen" option
- Enhanced Quality: Natural, expressive speech optimized for accessibility
- Voice Control: Multiple voice styles and delivery options
- Smart Processing: AI-optimized pronunciation and pacing
- Availability: Automatically detected and enabled when supported
- Universal Support: Works in all modern browsers
- Reliable Backup: Ensures functionality when Gemini TTS is unavailable
- Seamless Transition: Automatic fallback without user intervention
- Open the app in your mobile browser
- Look for "Add to Home Screen" or "Install" option
- Follow the prompts to install
- Look for the install icon in your browser's address bar
- Click to install as a desktop app
- Screen Reader Support: ARIA labels and semantic HTML
- Keyboard Navigation: Use Space or Enter to capture
- Advanced Audio: Gemini TTS optimized for visually impaired users
- Visual Feedback: Large, high-contrast text displays
- Repeat Functions: Easy access to replay descriptions
- Voice Customization: Adjustable speed and style preferences
- Camera Access: Required for image capture
- Internet Connection: Required for AI analysis and Gemini TTS
- Audio Support: HTML5 audio for Gemini TTS playback
- Speech Synthesis: Built-in browser support for fallback TTS
- PWA Support: Available in all modern browsers
- No Data Storage: Images are not stored on device or server
- Secure Processing: All data processed via Google's secure APIs
- API Key Protection: Server-side only API key storage
- Temporary Audio: Generated audio files are automatically cleaned up
- Build:
npm run build - Lint:
npm run lint - Type Check: Built-in TypeScript checking
- Free Tier: Limited requests per minute for both vision and TTS
- Smart Fallbacks: Automatic fallback systems for quota management
- Rate Limiting: Built-in retry logic with user feedback
- Upgrade Path: Billing account enables higher quotas
NaraNetra means "eye of knowledge" - now enhanced with Google's most advanced AI voice technology for natural, expressive audio assistance.