Oviya - Simply talk to Oviya and send out emails and meeting invites without friction.

Inspiration

The inspiration for Oviya appears to stem from the need to simplify everyday productivity tasks through voice commands. Based on the repository analysis, the project addresses the friction points in communication and scheduling that professionals face daily. The name "Oviya" likely represents a personalized assistant meant to streamline these workflows through natural language processing.

What it does

Oviya is a sophisticated voice assistant that:

  • Processes natural language voice commands to send emails and schedule meetings
  • Extracts specific intents from spoken language (send_email, create_meeting, or both)
  • Automatically resolves contact names to email addresses
  • Creates Google Calendar events with Google Meet links
  • Sends professionally formatted emails with meeting details
  • Operates in hybrid mode, using either cloud-based MCP (Multi-Party Computation) processing or local processing

How we built it

The architecture consists of several interconnected components:

  1. Frontend (React/Vite):

    • Voice recording interface that captures audio from the user
    • Uses ElevenLabs API for high-quality speech-to-text conversion
    • Displays real-time feedback and results to the user
  2. Backend (Python/FastAPI):

    • Natural language processing with Perplexity AI to extract structured data from voice commands
    • Intent classification system that determines if the user wants to send an email, schedule a meeting, or both
    • Integration with Google APIs (Gmail, Calendar, Contacts) for email and calendar functionality
    • MCP server integration for enhanced processing capabilities with fallback to local processing
  3. Email Formatting System:

    • Structured email format with professional design
    • Automatic generation of meeting details and Google Meet links
    • Context-aware email body generation
  4. Hybrid Processing Architecture:

    • Ability to process commands via MCP server or locally
    • Automatic fallback mechanism when MCP is unavailable
    • Configuration utilities to toggle between processing modes

Challenges we ran into

Based on the codebase, these were likely challenges:

  1. Intent Recognition Accuracy:

    • Extracting precise intents, subjects, recipients, and meeting times from natural language
    • Handling ambiguity in verbal commands
  2. API Integration Complexity:

    • Managing authentication flows for multiple third-party services (Google, ElevenLabs, Perplexity)
    • Coordinating between speech-to-text, NLP, and email/calendar services
  3. Robust Error Handling:

    • Implementing fallback mechanisms when services fail
    • Designing a system that gracefully degrades when certain components aren't available
  4. MCP Server Integration:

    • Creating a hybrid architecture that works with or without MCP
    • Building proper diagnostics and configuration tools for MCP connectivity
  5. Voice Recognition Quality:

    • Ensuring accurate transcription of various speech patterns and accents
    • Processing potentially noisy audio input

Accomplishments that we're proud of

Based on the implementation:

  1. Seamless Voice Interface:

    • Created a natural interface for email and meeting scheduling that feels like talking to a human assistant
    • Successfully integrated speech recognition with a complex processing pipeline
  2. Sophisticated Intent Extraction:

    • Built a system that can understand complex voice commands and extract structured data
    • Implemented processing for combined intents (e.g., "schedule a meeting and email the details")
  3. Hybrid Architecture:

    • Developed a system that can use advanced MCP capabilities when available but doesn't depend on them
    • Created diagnostic tools and configuration utilities for smooth operation
  4. Professional Email Formatting:

    • Implemented a consistent, professional email format for all messages
    • Generated contextually appropriate email content based on voice commands
  5. Robust Name Resolution:

    • Built a system that can match spoken names to email addresses using multiple sources

What we learned

The project demonstrates learning in several areas:

  1. Natural Language Processing: Techniques for extracting structured information from unstructured voice input
  2. API Integration: Working with multiple external APIs and handling their authentication flows
  3. Hybrid Architecture Design: Creating systems that can operate in different modes depending on available resources
  4. Error Handling and Resilience: Implementing fallback mechanisms and graceful degradation
  5. Voice Interface Design: Designing interfaces that work well with spoken commands
  6. Context-Aware Response Generation: Creating responses that maintain context and sound natural

What's next for Oviya

Based on the current implementation, potential next steps might include:

  1. Expanded Voice Capabilities:

    • Adding more intents beyond email and meetings (task creation, reminders, note-taking)
    • Supporting more complex meeting scenarios (recurring meetings, room booking)
  2. Enhanced Natural Language Understanding:

    • Improving context awareness and memory across multiple commands
    • Adding support for corrections and modifications to previous commands
  3. Multi-User Support:

    • Enabling personalized experiences for different users
    • Supporting team workflows and collaboration
  4. Integration with More Services:

    • Adding support for additional productivity tools (Slack, Asana, Trello)
    • Expanding beyond Google services to support Microsoft 365, Zoom, etc.
  5. Mobile Application:

    • Creating a mobile version for on-the-go productivity
    • Implementing notification systems for upcoming meetings and responses
  6. Advanced MCP Capabilities:

    • Leveraging more sophisticated MCP features for enhanced privacy and security
    • Implementing distributed processing for improved performance
  7. Voice Authentication:

    • Adding voice biometrics for secure, password-less authentication
    • Implementing permission models based on voice recognition

Built With

  • elevenlabsapi
  • fastapi
  • gmailapi
  • googleapis
  • googlecalendarapi
  • googlecontactsapi
  • javascript/jsx
  • mcp
  • perplexityapi
  • python
  • react
  • vite
Share this project:

Updates