Inspiration
Modern audio production software is extremely powerful but still relies on complex menus, shortcuts, and technical knowledge that create friction for creators. We were inspired by tools like Cursor that let developers interact with software through natural language and wondered: why doesn’t this exist for creative tools like digital audio workstations (DAWs)? StudioCursor was built to make audio editing faster, more intuitive, and accessible by allowing users to simply describe what they want instead of manually performing every step.
What it does
StudioCursor is an AI-powered editing assistant embedded directly inside REAPER. Users can type or speak natural language commands such as “fade this out,” “crossfade these clips,” or “raise the volume by 3 dB,” and StudioCursor performs the edits instantly.
The system supports multi-step commands, voice input via speech-to-text, and optional spoken feedback. It understands the current DAW context (selected clips, tracks, cursor position, time selection) and converts user intent into safe, undoable editing actions.
Current capabilities include:
- Fade in/out clips by duration
- Crossfade clips
- Adjust volume (dB or percent)
- Set pan position
- Add effects (EQ, compressor, reverb)
- Mute/solo tracks
- Split at cursor
- Duplicate clips or tracks
- Multi-step commands in a single request
- Voice control with speech-to-text and text-to-speech
All edits execute inside a single undo block for safety.
How we built it
StudioCursor combines three main layers:
1. DAW Integration (Lua + ReaImGui)
We built a dockable sidebar panel inside REAPER using Lua and the ReaImGui library. This provides the chat interface, voice controls, and direct access to REAPER’s editing APIs.
2. AI Planning + Validation (Python + Gemini)
A Python bridge sends commands and live DAW context to the Gemini API, which converts natural language into structured tool calls. We implemented strict validation rules to ensure only safe, whitelisted actions can execute, preventing unpredictable behavior.
3. Speech System (ElevenLabs + ffmpeg)
Voice commands are recorded locally using ffmpeg, transcribed with ElevenLabs speech-to-text, then routed through the same AI pipeline. Text-to-speech is used for optional spoken feedback.
The architecture uses JSON contracts between Lua and Python, allowing deterministic execution and clear safety boundaries between AI reasoning and DAW operations.
Challenges we ran into
One of the biggest challenges was safely connecting an AI model to a professional editing environment. We needed to ensure that incorrect model outputs could never damage a project. This required building strict schemas, validation layers, and execution guardrails.
Another challenge was handling differences between terminal environments and GUI applications, particularly for dependencies like ffmpeg and environment variables inside REAPER.
We also had to manage conversational ambiguity without creating infinite clarification loops, which led to designing a single-turn clarification system with button-based disambiguation.
Finally, integrating voice recording reliably across operating systems required careful handling of audio devices and subprocess execution.
Accomplishments that we're proud of
We are proud that StudioCursor works entirely inside a real DAW rather than as a prototype web app. The system can perform real edits on real audio projects with undo safety.
Other accomplishments include:
- A full natural language → AI → validation → execution pipeline
- Multi-step command support in a single request
- Voice control integrated into the same editing pipeline
- Strict AI safety guardrails for professional workflows
- Dockable UI panel native to REAPER
- Cross-language architecture between Lua and Python
Seeing complex edits executed from simple sentences was a major milestone for us.
What we learned
We learned how important safety layers are when integrating AI with tools that affect real user data. Prompting alone is not enough; deterministic validation is essential.
We also gained experience building cross-language systems, integrating audio tooling, and designing user interfaces inside nontraditional environments like DAWs.
Perhaps most importantly, we learned that AI interfaces become much more compelling when embedded directly into the user’s workflow rather than existing as a separate application.
What's next for StudioCursor
Our next steps include expanding the editing toolset, improving reliability across operating systems, and adding deeper project awareness such as session memory and contextual recommendations.
We also plan to explore advanced creative features like AI-assisted songwriting and vocal generation, collaborative workflows, and integrations with additional DAWs beyond REAPER.
Long term, we envision StudioCursor as a universal AI layer for creative software, making professional production tools accessible to anyone through natural language.
Log in or sign up for Devpost to join the conversation.