Voix.ai

Icon
"Hey Friday, Write a 200 word essay on AI Safety and store it as test on desktop."
The document generated on AI Safety and saved as test in the desktop.

Inspiration

We wanted to build a system that made interacting with your computer as natural as speaking to another person. Most existing voice assistants like Siri, Copilot, or Alexa only handle single commands and often fail in real-world workflows. We wanted to change that by creating a voice-first operating system copilot that could actually run applications, automate tasks, and orchestrate multi-step workflows end-to-end. That became Voix, a local-first AI system that listens, reasons, and executes.

What It Does

Voix is a desktop AI copilot for Windows that executes complex, multi-step commands through natural voice input. Users can say commands like: • “Hey Friday, Open Word, write a 120-word essay on AI safety, save it to Desktop as test” • “Hey Friday, Start meeting.” which triggers real-time recording, transcription and action item extraction. • “Hey Friday, open Terminal." • “Hey Friday, what is the weather in SF today."

Voix acts as a voice layer for the operating system, combining speech recognition, reasoning, and automation. It integrates directly with system and productivity applications, and can also connect to APIs for external intelligence (weather, time zones, email, etc.).

Core capabilities:

• System Control: Execute commands across Windows applications.
• Web Intelligence: Query web data in real time without context-switching.
• Document Automation: Generate and manage Word documents using speech.
• Meeting Intelligence: Record meetings locally, transcribe them and extract action items.

How We Built It

• Frontend: A custom floating desktop widget for microphone control and real-time transcription.
• Speech-to-Text: Faster-Whisper for on-device transcription to ensure privacy and low latency.
• LLM Orchestration: Groq and Claude for intent understanding, task planning, and text generation.
• System Automation: Python-based COM automation to control applications like Chrome, Word, Excel, and others.

All processing runs locally or through low-latency APIs to ensure responsiveness and security.

Challenges We Ran Into

• Integrating speech recognition and real-time app automation in a single loop without lag or blocking operations.
• Handling Windows-level automation reliably across multiple applications through COM.
• Ensuring the system stayed fully local-first without depending on cloud-based pipelines.
• Continuous audio capture can't freeze the voice command loop. We built a threaded audio streaming system with proper start/stop synchronization and graceful shutdown handling.

Accomplishments That We’re Proud Of

• Built a fully functional desktop voice copilot prototype capable of running real Windows applications.
• Achieved good response time with low latency for voice-to-action execution.
• Implemented real-time meeting transcription on-device.
• Created a unified architecture that integrates Whisper, LLMs, and OS control.

What We Learned

• How to design and synchronize multi-agent reasoning systems for desktop environments.
• The challenges of bridging AI intent understanding with deterministic automation layers.
• The importance of fast, local-first inference for maintaining usability in a real-time system.
• The critical role of UX in voice-first systems, latency and reliability define trust.

What’s Next for Voix.ai

• Expanding compatibility to macOS and Linux environments.
• Developing persistent, user-trainable command memory.
• Building a plugin SDK for third-party command modules.
• Extending meeting analysis to generate topic graphs and decision maps.
• Launching a public beta and integrating additional APIs for email, calendar, and collaboration tools.

Our long-term goal is to make Voix the default voice-first control layer for all desktop computing, one that turns your workflow into a conversation instead of a sequence of clicks.