ui view of oviya chrome extension
email invitation automatically sent by the voice assistant agent

Oviya - Simply talk to Oviya and send out emails and meeting invites without friction.

Inspiration

The inspiration for Oviya appears to stem from the need to simplify everyday productivity tasks through voice commands. Based on the repository analysis, the project addresses the friction points in communication and scheduling that professionals face daily. The name "Oviya" likely represents a personalized assistant meant to streamline these workflows through natural language processing.

What it does

Oviya is a sophisticated voice assistant that:

Processes natural language voice commands to send emails and schedule meetings
Extracts specific intents from spoken language (send_email, create_meeting, or both)
Automatically resolves contact names to email addresses
Creates Google Calendar events with Google Meet links
Sends professionally formatted emails with meeting details
Operates in hybrid mode, using either cloud-based MCP (Multi-Party Computation) processing or local processing

How we built it

The architecture consists of several interconnected components:

Frontend (React/Vite):
- Voice recording interface that captures audio from the user
- Uses ElevenLabs API for high-quality speech-to-text conversion
- Displays real-time feedback and results to the user
Backend (Python/FastAPI):
- Natural language processing with Perplexity AI to extract structured data from voice commands
- Intent classification system that determines if the user wants to send an email, schedule a meeting, or both
- Integration with Google APIs (Gmail, Calendar, Contacts) for email and calendar functionality
- MCP server integration for enhanced processing capabilities with fallback to local processing
Email Formatting System:
- Structured email format with professional design
- Automatic generation of meeting details and Google Meet links
- Context-aware email body generation
Hybrid Processing Architecture:
- Ability to process commands via MCP server or locally
- Automatic fallback mechanism when MCP is unavailable
- Configuration utilities to toggle between processing modes

Challenges we ran into

Based on the codebase, these were likely challenges:

Intent Recognition Accuracy:
- Extracting precise intents, subjects, recipients, and meeting times from natural language
- Handling ambiguity in verbal commands
API Integration Complexity:
- Managing authentication flows for multiple third-party services (Google, ElevenLabs, Perplexity)
- Coordinating between speech-to-text, NLP, and email/calendar services
Robust Error Handling:
- Implementing fallback mechanisms when services fail
- Designing a system that gracefully degrades when certain components aren't available
MCP Server Integration:
- Creating a hybrid architecture that works with or without MCP
- Building proper diagnostics and configuration tools for MCP connectivity
Voice Recognition Quality:
- Ensuring accurate transcription of various speech patterns and accents
- Processing potentially noisy audio input

Accomplishments that we're proud of

Based on the implementation:

Seamless Voice Interface:
- Created a natural interface for email and meeting scheduling that feels like talking to a human assistant
- Successfully integrated speech recognition with a complex processing pipeline
Sophisticated Intent Extraction:
- Built a system that can understand complex voice commands and extract structured data
- Implemented processing for combined intents (e.g., "schedule a meeting and email the details")
Hybrid Architecture:
- Developed a system that can use advanced MCP capabilities when available but doesn't depend on them
- Created diagnostic tools and configuration utilities for smooth operation
Professional Email Formatting:
- Implemented a consistent, professional email format for all messages
- Generated contextually appropriate email content based on voice commands
Robust Name Resolution:
- Built a system that can match spoken names to email addresses using multiple sources

What we learned

The project demonstrates learning in several areas:

Natural Language Processing: Techniques for extracting structured information from unstructured voice input
API Integration: Working with multiple external APIs and handling their authentication flows
Hybrid Architecture Design: Creating systems that can operate in different modes depending on available resources
Error Handling and Resilience: Implementing fallback mechanisms and graceful degradation
Voice Interface Design: Designing interfaces that work well with spoken commands
Context-Aware Response Generation: Creating responses that maintain context and sound natural

What's next for Oviya

Based on the current implementation, potential next steps might include:

Expanded Voice Capabilities:
- Adding more intents beyond email and meetings (task creation, reminders, note-taking)
- Supporting more complex meeting scenarios (recurring meetings, room booking)
Enhanced Natural Language Understanding:
- Improving context awareness and memory across multiple commands
- Adding support for corrections and modifications to previous commands
Multi-User Support:
- Enabling personalized experiences for different users
- Supporting team workflows and collaboration
Integration with More Services:
- Adding support for additional productivity tools (Slack, Asana, Trello)
- Expanding beyond Google services to support Microsoft 365, Zoom, etc.
Mobile Application:
- Creating a mobile version for on-the-go productivity
- Implementing notification systems for upcoming meetings and responses
Advanced MCP Capabilities:
- Leveraging more sophisticated MCP features for enhanced privacy and security
- Implementing distributed processing for improved performance
Voice Authentication:
- Adding voice biometrics for secure, password-less authentication
- Implementing permission models based on voice recognition

Built With

elevenlabsapi
fastapi
gmailapi
googleapis
googlecalendarapi
googlecontactsapi
javascript/jsx
mcp
perplexityapi
python
react
vite

Updates

Anirudh Sivakumar started this project — May 17, 2025 09:09 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.