Inspiration
In a world that's more mobile and busier than ever, our most important workflows, files, and applications remain inaccessible when we're away from our computers. For everyone, bringing around a laptop is inconvenient. For many more, their desktop workstations remain geographically jailed at your desk.
With the advent of advanced frontier models that can reason and take action across modalities, we envision a future where you're not just liberated from the physical constraints of your computer but empowered by an intelligent agent that can help you get work done conversationally, anywhere in the world. Through our tools, we transform existing AI models from a chatbot to an assistant that understands personal context, marking a step towards truly agentic AI; Agent Evan can fetch that file on your hard drive miles away, create presentations using your own data and content, and even start training jobs for you.
What Evan does
Agent Evan is an AI agent that lives on your desktop and lets you do anything, from anywhere:
- Execute complex multi-step workflows through natural language commands from your mobile devices
- Run code, manage files, and control applications remotely with voice commands
- Monitor long-running processes and provide real-time updates while you're mobile
- Handle routine manual tasks like file organization, system maintenance, and process monitoring
- Integrate with every application on your desktop through intelligent tool selection
- Maintain conversation context and execute autonomous workflows without constant supervision
The Underlying Technologies behind Evan
Agent Evan is powered by a mobile app, desktop agent, and server all working in tandem.
Our mobile app runs natively in Swift and leverages SwiftUI, creating a UI that leverages the latest Liquid Glass design that lets it feel right at home on iOS. We leverage Apple Intelligence's latest Foundation Models framework for on-device AI, powering light intelligence capabilities like organizing your tasks. We leverage Cartesia's Sonic model for near-real-time speech-to-text.
Our mobile app communicates with the desktop Agent using a server built on top of Cloudflare Workers and Supabase S3 for larger agent-created content. The server leverages WebSockets for instant communication between the mobile client and the Agent.
Our desktop app is based on Python. Each Claude-based agent is sandboxed, enabling better privacy and security; this is achieved through a Docker container and safeguards in the tools they use to interact with the outside world. Our extensive sandboxing also enables safe concurrency, allowing complex tasks to be completed significantly faster.
Our Agents feature multiple tools, designed together to form a set of primitives that allow Evan to perform virtually any task imaginable. Our tools include:
General agentic tools: Web fetch, web search, text file reading and editing Core system: system shell (zsh), file copy, write, symlinking, and directory reading Additional system access: Mail and Calendar (through App Intents), Photos (through native system frameworks), and browser control through Playwright Agent system: Bash, document submission tools
We carefully design the Ubuntu system our agent lives in to integrate Jupyter notebook execution, OCR with PyTesseract, image processing with OpenCV, document generation with LaTeX and Pandoc, and LibreOffice automation, Node.JS, and more. This enables an otherwise-text-based AI system to create posters, financial models, and dynamic charts.
Leveraging the intelligence capabilities of modern LLMs, our core philosophy is that the model knows best; we provide the model with ample context and every tool at all times. The model is never forced to make a tool choice, and neither do we dynamically provide or retract tools based on context. The result is an intelligent, adaptive system that knows how to chain together simple tools to accomplish complex tasks fully autonomously. During development, when some tools occasionally became unavailable, the Agent often shocked us with its adaptability, quickly realizing the problem and using other tools to accomplish the same task.
Challenges we ran into
Setting up a Linux VM inside Docker and then accessing the VM through our tools in a safe manner presented significant security and networking challenges. We also faced complex issues with WebSocket connection management across different network conditions and synchronizing complex state between mobile and desktop environments. Building the dynamic tool selection system required extensive prompt engineering to ensure Claude could reliably choose and execute the right tools for each task. Additionally, implementing real-time speech processing while maintaining low latency across the entire pipeline proved technically demanding within our 36-hour timeframe.
Accomplishments that we're proud of
We built a fully functional agentic AI system with complete desktop access controllable from a mobile device anywhere in the world. Our dynamic tool selection system chooses from dozens of available tools based on natural language intent, enabling truly autonomous workflows. We achieved real-time voice interaction with sub-second response times while maintaining secure, sandboxed execution. Most importantly, we demonstrated that agentic AI can move beyond conversational interfaces to become a genuine productivity multiplier that handles complex, multi-step tasks without human intervention.
What we learned
This project pushed us to understand the complexity of building truly agentic systems that bridge mobile and desktop environments. We gained deep experience with the WebSocket architecture, containerized security models, and advanced prompt engineering for tool selection. We learned how to integrate multiple AI models effectively—combining Apple's on-device models, Cartesia's speech processing, and Claude's reasoning capabilities into a cohesive system. Most importantly, we discovered the technical and UX challenges of creating AI that users can trust with autonomous system access.
What's next for Agent Evan
Agent Evan represents what we see as the future of human-computer interaction — where multimodal intelligent systems give humans more enjoyable ways to stay productive, assist in realizing the true potential of human creativity, and erase the traditional limits of touchscreen and keyboard-and-mice interfaces.
Today, Agent Evan's relatively-basic agentic capabilities represent a small step towards truly ubiquitous computing, allowing you to have a digital twin in your pocket. In the short term, we want to bring the capabilities of Agent Evan to more operating systems beyond macOS and expand Agent Evan beyond Claude to support more models.
As we look far into the future, we plan to leverage the exponentially-increasing capabilities of reasoning models and expanding tool ecosystems to enable Evan to tackle increasingly complex workloads so you can focus on great ideas while Evan brings them to life.
by the way, (Agent stands for A General Evan ageNT)
Built With
- app-intents
- cartesia
- claude
- cloudflare
- javascript
- latex
- libreoffice
- on-device-ai
- opencv
- pandoc
- pillow
- playwright
- pydf
- pytesseract
- python
- supabase
- swift
- zsh






Log in or sign up for Devpost to join the conversation.