VirtualPilot

The agent highlights youtube icon indicating that it has to be clicked
The Agent highlights the search bar indicating that it has to clicked

Inspiration

I've witnessed countless elderly people and newcomers struggling to learn basic computer skills, often with no one available to guide them. This digital divide inspired me to create VirtualPilot, an AI-powered solution that ensures no one gets left behind in our increasingly digital world.

What it does

VirtualPilot is a revolutionary dual-AI system that both teaches and automates computer tasks. The Learning Agent is a voice-based AI teacher that breaks down complex tasks into simple, interactive steps. Users simply speak their goals, and the agent provides multilingual voice instructions with precise visual highlighting and step-by-step guidance. The Automation Agent is a Gemini-powered GUI agent that can execute tasks automatically using natural language commands, handling everything from simple clicks to complex workflows. The system operates in English, Spanish, Hindi, and French, making computer literacy accessible globally.

How we built it

The Learning Agent uses Claude for intelligent reasoning, OmniParser for open-source screen element detection, PyQt for interactive highlighting with boxes and arrows, and LMNT for custom voice interactions with native language support. It combines visual, audio, and text inputs in a truly multimodal approach. The Automation Agent uses OmniParser v2 for enhanced screen parsing, PyAutoGUI for precise mouse and keyboard control, LangChain as the agent framework for complex reasoning, and Gemini for visual understanding and task execution. The entire system was built from scratch with custom integrations, real-time screen analysis, and seamless switching between teaching and automation modes.

Challenges we ran into

Working alone meant prototyping every component independently with AI assistance. I faced significant difficulty creating PyQt highlighting code to display precise, real-time visual overlays for screen elements. Building the agent system from scratch required extensive testing and iteration to achieve consistent performance. Coordinating multimodal systems that synchronize voice, visual, and interaction components proved complex. Ensuring consistent behavior across different Mac configurations added another layer of complexity.

Accomplishments that we're proud of

I built the entire application from concept to working prototype in just 24 hours. The solution genuinely addresses the needs of underserved communities struggling with technology access. I successfully integrated multiple AI models including Claude, Gemini, and LMNT into a cohesive, functional system. The multilingual support ensures global accessibility, breaking down language barriers to technology education. The scalable architecture provides a solid foundation ready for enterprise and educational deployment.

What we learned

I mastered advanced AI integration techniques including LMNT voice synthesis, Claude reasoning capabilities, and Gemini visual understanding. I developed efficient rapid prototyping and testing workflows that allowed for quick iteration. Working on this project gave me deep insights into accessibility needs and inclusive technology design principles. I gained extensive experience with multi-agent systems and coordination between different AI components. Most importantly, I learned how to build AI applications that solve real human problems rather than just demonstrating technical capabilities.

What's next for VirtualPilot

In the immediate term, I plan to enhance the learning capabilities with research mode and voice cloning features that can simulate family members for more personalized instruction. I want to extend context handling to support longer, more complex workflows and expand platform compatibility to Windows and Linux systems. Long-term, I envision partnering with educational institutions and senior centers for global deployment, developing enterprise solutions for corporate training and onboarding, and building a sustainable startup around digital inclusion. The ultimate mission is making VirtualPilot available to millions worldwide, ensuring everyone can confidently navigate the digital age regardless of their background or technical experience. VirtualPilot represents more than just software - it's a bridge to digital empowerment for all.

For the Sponsors

Groq - research agent to find similar computer tasks online, whisper for quick speech to text. Anthropic - Claude agent to teach users how to use a PC. Google - Gemini used for GUI automation Letta - Research agent and GUI automation agent memory LMNT - Text to speech and continuous audio communication Unify - Multimodal Agent

Built With

anthropic
gemini
google
groq
letta
lmnt
nobelera
unify

Updates

Jayavibhav Niranjan Kogundi started this project — Jun 22, 2025 01:39 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.