Oviya - Simply talk to Oviya and send out emails and meeting invites without friction.
Inspiration
The inspiration for Oviya appears to stem from the need to simplify everyday productivity tasks through voice commands. Based on the repository analysis, the project addresses the friction points in communication and scheduling that professionals face daily. The name "Oviya" likely represents a personalized assistant meant to streamline these workflows through natural language processing.
What it does
Oviya is a sophisticated voice assistant that:
- Processes natural language voice commands to send emails and schedule meetings
- Extracts specific intents from spoken language (send_email, create_meeting, or both)
- Automatically resolves contact names to email addresses
- Creates Google Calendar events with Google Meet links
- Sends professionally formatted emails with meeting details
- Operates in hybrid mode, using either cloud-based MCP (Multi-Party Computation) processing or local processing
How we built it
The architecture consists of several interconnected components:
Frontend (React/Vite):
- Voice recording interface that captures audio from the user
- Uses ElevenLabs API for high-quality speech-to-text conversion
- Displays real-time feedback and results to the user
Backend (Python/FastAPI):
- Natural language processing with Perplexity AI to extract structured data from voice commands
- Intent classification system that determines if the user wants to send an email, schedule a meeting, or both
- Integration with Google APIs (Gmail, Calendar, Contacts) for email and calendar functionality
- MCP server integration for enhanced processing capabilities with fallback to local processing
Email Formatting System:
- Structured email format with professional design
- Automatic generation of meeting details and Google Meet links
- Context-aware email body generation
Hybrid Processing Architecture:
- Ability to process commands via MCP server or locally
- Automatic fallback mechanism when MCP is unavailable
- Configuration utilities to toggle between processing modes
Challenges we ran into
Based on the codebase, these were likely challenges:
Intent Recognition Accuracy:
- Extracting precise intents, subjects, recipients, and meeting times from natural language
- Handling ambiguity in verbal commands
API Integration Complexity:
- Managing authentication flows for multiple third-party services (Google, ElevenLabs, Perplexity)
- Coordinating between speech-to-text, NLP, and email/calendar services
Robust Error Handling:
- Implementing fallback mechanisms when services fail
- Designing a system that gracefully degrades when certain components aren't available
MCP Server Integration:
- Creating a hybrid architecture that works with or without MCP
- Building proper diagnostics and configuration tools for MCP connectivity
Voice Recognition Quality:
- Ensuring accurate transcription of various speech patterns and accents
- Processing potentially noisy audio input
Accomplishments that we're proud of
Based on the implementation:
Seamless Voice Interface:
- Created a natural interface for email and meeting scheduling that feels like talking to a human assistant
- Successfully integrated speech recognition with a complex processing pipeline
Sophisticated Intent Extraction:
- Built a system that can understand complex voice commands and extract structured data
- Implemented processing for combined intents (e.g., "schedule a meeting and email the details")
Hybrid Architecture:
- Developed a system that can use advanced MCP capabilities when available but doesn't depend on them
- Created diagnostic tools and configuration utilities for smooth operation
Professional Email Formatting:
- Implemented a consistent, professional email format for all messages
- Generated contextually appropriate email content based on voice commands
Robust Name Resolution:
- Built a system that can match spoken names to email addresses using multiple sources
What we learned
The project demonstrates learning in several areas:
- Natural Language Processing: Techniques for extracting structured information from unstructured voice input
- API Integration: Working with multiple external APIs and handling their authentication flows
- Hybrid Architecture Design: Creating systems that can operate in different modes depending on available resources
- Error Handling and Resilience: Implementing fallback mechanisms and graceful degradation
- Voice Interface Design: Designing interfaces that work well with spoken commands
- Context-Aware Response Generation: Creating responses that maintain context and sound natural
What's next for Oviya
Based on the current implementation, potential next steps might include:
Expanded Voice Capabilities:
- Adding more intents beyond email and meetings (task creation, reminders, note-taking)
- Supporting more complex meeting scenarios (recurring meetings, room booking)
Enhanced Natural Language Understanding:
- Improving context awareness and memory across multiple commands
- Adding support for corrections and modifications to previous commands
Multi-User Support:
- Enabling personalized experiences for different users
- Supporting team workflows and collaboration
Integration with More Services:
- Adding support for additional productivity tools (Slack, Asana, Trello)
- Expanding beyond Google services to support Microsoft 365, Zoom, etc.
Mobile Application:
- Creating a mobile version for on-the-go productivity
- Implementing notification systems for upcoming meetings and responses
Advanced MCP Capabilities:
- Leveraging more sophisticated MCP features for enhanced privacy and security
- Implementing distributed processing for improved performance
Voice Authentication:
- Adding voice biometrics for secure, password-less authentication
- Implementing permission models based on voice recognition

Log in or sign up for Devpost to join the conversation.