Inspiration

The inspiration for Voxi came when I attended a seminar where I saw a project that scans coins. This sparked my curiosity about computer vision and AI applications. I thought, "If technology can identify coins, why not create something that can scan and read documents and objects?" But I wanted to go beyond just scanning—I envisioned creating a voice companion that could greet users, understand their questions, and provide answers through natural conversation. This would address the daily challenges faced by the 285 million visually impaired individuals worldwide who struggle with simple tasks like reading menus, identifying objects, or navigating environments.

What it does

Voxi is a voice-controlled AI companion that transforms any Android smartphone into an intelligent assistive device. The core focus is on reliable text reading and voice interaction:

  • Greets users warmly with multilingual welcome messages
  • Reads any text with high accuracy using Google ML Kit
  • Maintains natural conversation throughout interactions
  • Operates bilingually (Hindi & English) with seamless switching
  • Provides continuous voice guidance for navigation
  • Includes experimental object detection as a developing research feature

How we built it

Technology Stack:

  • Frontend: Android Kotlin with Jetpack Compose for modern UI
  • AI/ML: TensorFlow Lite (COCO SSD MobileNet v2) for object detection
  • Text Recognition: Google ML Kit for accurate document reading
  • Speech Processing: Android Speech-to-Text and Text-to-Speech APIs with conversational design
  • Camera: CameraX with Camera2 API for optimal image capture

Voice Companion Architecture:

  • Conversational flow design with greeting protocols and response patterns
  • Context-aware responses that remember previous interactions
  • Natural language processing for understanding user intent beyond basic commands
  • Continuous listening mode with smart activation for seamless conversation
  • Bilingual conversation switching maintaining context across languages

Challenges we ran into

Technical Challenges:

  • Model optimization: Balancing AI model size with inference speed on mobile devices
  • Memory management: Handling large TensorFlow Lite models efficiently
  • Voice recognition: Ensuring reliable speech processing in noisy environments
  • Conversational design: Creating natural dialogue flows that feel like genuine companionship
  • Performance optimization: Achieving real-time processing for smooth user experience

Solutions Implemented:

  • Optimized inference pipeline: Reduced detection latency for real-time processing
  • Multi-preprocessing approach: Applied different image enhancement techniques for better detection
  • Conversational response system: Designed natural greeting protocols and contextual responses
  • Bilingual conversation management: Seamless language switching while maintaining dialogue flow

Accomplishments that we're proud of

  • Reliable text reading system that works consistently for daily use
  • Natural voice companion that feels like genuine interaction
  • Seamless bilingual operation supporting Hindi and English
  • Accessibility-first design requiring zero visual interaction
  • Research foundation for future AI enhancements
  • Cost-effective solution providing affordable assistive technology ## What we learned

Technical Learning:

  • Deep understanding of mobile AI optimization and TensorFlow Lite deployment
  • Conversational design principles for creating natural human-AI interactions
  • Accessibility principles and inclusive design methodologies
  • Computer vision integration and contextual AI problem-solving
  • Speech processing integration for seamless voice interaction

Impact Learning:

  • The importance of emotional design in assistive technology—users need companionship, not just functionality
  • How conversational AI can reduce isolation for visually impaired users
  • The power of voice interfaces that feel natural and conversational
  • How AI can democratize accessibility when thoughtfully implemented
  • The significance of affordable solutions that provide both functionality and emotional support

What's next for Voxi

Immediate Priorities:

  • Enhanced text reading with more language support
  • Advanced voice interaction capabilities
  • Research into improved object detection using next-generation AI models
  • Custom model development specifically for accessibility applications

Technical Roadmap:

  • Model retraining with accessibility-focused datasets for higher accuracy
  • Custom object recognition trained specifically for visually impaired user needs
  • Enhanced AI accuracy through user feedback and improved training methodologies
  • GPS navigation integration with conversational guidance for outdoor mobility
  • Currency recognition for financial independence (inspired by the original coin scanning project)

Long-term Vision:

  • Specialized AI models trained specifically for assistive technology applications
  • Community-driven training where user feedback improves model accuracy
  • Expanded language support: Adding 10+ regional Indian languages
  • Enterprise solutions: Workplace accessibility tools with professional conversation modes
  • Global deployment: Reaching 10,000+ users in the first year

Research Goals:

  • Advanced model training techniques for better object recognition accuracy
  • Accessibility-specific datasets development for improved training outcomes
  • Cross-platform expansion to iOS and web applications
  • Open-source contribution to the global accessibility and AI community

Voxi represents a digital companion that provides both practical assistance and emotional support. While the text reading functionality is robust and reliable, we are actively working on training better models to improve object detection accuracy, ensuring that Voxi becomes an even more powerful tool for empowering visually impaired users to interact confidently with their world.

Built With

  • android
  • android-speech-to-text-api
  • android-studio
  • android-text-to-speech-api
  • bytebuffer
  • camera2-api
  • camerax
  • coco-ssd-mobilenet-v2
  • google-ml-kit
  • jetpack-compose
  • kotlin
  • opencv
  • tensorflow-lite
Share this project:

Updates