voxi | Devpost

Inspiration

The inspiration for Voxi came when I attended a seminar where I saw a project that scans coins. This sparked my curiosity about computer vision and AI applications. I thought, "If technology can identify coins, why not create something that can scan and read documents and objects?" But I wanted to go beyond just scanning—I envisioned creating a voice companion that could greet users, understand their questions, and provide answers through natural conversation. This would address the daily challenges faced by the 285 million visually impaired individuals worldwide who struggle with simple tasks like reading menus, identifying objects, or navigating environments.

What it does

Voxi is a voice-controlled AI companion that transforms any Android smartphone into an intelligent assistive device. The core focus is on reliable text reading and voice interaction:

Greets users warmly with multilingual welcome messages
Reads any text with high accuracy using Google ML Kit
Maintains natural conversation throughout interactions
Operates bilingually (Hindi & English) with seamless switching
Provides continuous voice guidance for navigation
Includes experimental object detection as a developing research feature

How we built it

Technology Stack:

Frontend: Android Kotlin with Jetpack Compose for modern UI
AI/ML: TensorFlow Lite (COCO SSD MobileNet v2) for object detection
Text Recognition: Google ML Kit for accurate document reading
Speech Processing: Android Speech-to-Text and Text-to-Speech APIs with conversational design
Camera: CameraX with Camera2 API for optimal image capture

Voice Companion Architecture:

Conversational flow design with greeting protocols and response patterns
Context-aware responses that remember previous interactions
Natural language processing for understanding user intent beyond basic commands
Continuous listening mode with smart activation for seamless conversation
Bilingual conversation switching maintaining context across languages

Challenges we ran into

Technical Challenges:

Model optimization: Balancing AI model size with inference speed on mobile devices
Memory management: Handling large TensorFlow Lite models efficiently
Voice recognition: Ensuring reliable speech processing in noisy environments
Conversational design: Creating natural dialogue flows that feel like genuine companionship
Performance optimization: Achieving real-time processing for smooth user experience

Solutions Implemented:

Optimized inference pipeline: Reduced detection latency for real-time processing
Multi-preprocessing approach: Applied different image enhancement techniques for better detection
Conversational response system: Designed natural greeting protocols and contextual responses
Bilingual conversation management: Seamless language switching while maintaining dialogue flow

Accomplishments that we're proud of

Reliable text reading system that works consistently for daily use
Natural voice companion that feels like genuine interaction
Seamless bilingual operation supporting Hindi and English
Accessibility-first design requiring zero visual interaction
Research foundation for future AI enhancements
Cost-effective solution providing affordable assistive technology ## What we learned

Technical Learning:

Deep understanding of mobile AI optimization and TensorFlow Lite deployment
Conversational design principles for creating natural human-AI interactions
Accessibility principles and inclusive design methodologies
Computer vision integration and contextual AI problem-solving
Speech processing integration for seamless voice interaction

Impact Learning:

The importance of emotional design in assistive technology—users need companionship, not just functionality
How conversational AI can reduce isolation for visually impaired users
The power of voice interfaces that feel natural and conversational
How AI can democratize accessibility when thoughtfully implemented
The significance of affordable solutions that provide both functionality and emotional support

What's next for Voxi

Immediate Priorities:

Enhanced text reading with more language support
Advanced voice interaction capabilities
Research into improved object detection using next-generation AI models
Custom model development specifically for accessibility applications

Technical Roadmap:

Model retraining with accessibility-focused datasets for higher accuracy
Custom object recognition trained specifically for visually impaired user needs
Enhanced AI accuracy through user feedback and improved training methodologies
GPS navigation integration with conversational guidance for outdoor mobility
Currency recognition for financial independence (inspired by the original coin scanning project)

Long-term Vision:

Specialized AI models trained specifically for assistive technology applications
Community-driven training where user feedback improves model accuracy
Expanded language support: Adding 10+ regional Indian languages
Enterprise solutions: Workplace accessibility tools with professional conversation modes
Global deployment: Reaching 10,000+ users in the first year

Research Goals:

Advanced model training techniques for better object recognition accuracy
Accessibility-specific datasets development for improved training outcomes
Cross-platform expansion to iOS and web applications
Open-source contribution to the global accessibility and AI community

Voxi represents a digital companion that provides both practical assistance and emotional support. While the text reading functionality is robust and reliable, we are actively working on training better models to improve object detection accuracy, ensuring that Voxi becomes an even more powerful tool for empowering visually impaired users to interact confidently with their world.

Built With

android
android-speech-to-text-api
android-studio
android-text-to-speech-api
bytebuffer
camera2-api
camerax
coco-ssd-mobilenet-v2
google-ml-kit
jetpack-compose
kotlin
opencv
tensorflow-lite

Updates

Pallavi Papnai started this project — Jul 19, 2025 03:29 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.