Curtroller

Control Cursor with a PlayStation controller. Full vibe coding.


Inspiration

I was deep in a vibe coding session — leaning back, letting Cursor's agent write code, reviewing diffs as they came in. But every few minutes the same thing happened: the AI finished, and I had to lurch forward to hit Tab, type a command, or reach for the keyboard to accept a change. The flow kept breaking.

I had a PS5 controller sitting on my desk. It has 17 buttons. Cursor's core AI loop — accept suggestion, reject diff, new chat, voice input, cancel generation — needs maybe 10. The math worked. What if you could drive the entire AI coding loop from the couch, controller in hand, never touching the keyboard?

That idea became Curtroller.


What I Learned

Building this taught me that "simple button → action" projects are never actually simple. There are layers between physical input and software effect, and each layer has its own rules.

The Gamepad API requires a visible page. My first design polled navigator.getGamepads() in a hidden background webview. Chromium silently returns empty arrays for background tabs. The fix was to merge the gamepad bridge and the HUD into one visible panel — which accidentally made the demo better, since button presses now light up in real time.

VS Code's type command doesn't reach webviews. Cursor's Composer is a webview panel. Sending \n via vscode.commands.executeCommand('type', { text: '\n' }) goes nowhere — that command only routes to Monaco editor instances. The solution was OS-level AppleScript: keystroke return sends to whatever the OS considers focused, which is the Composer after composer.focusComposer.

Command success is not the same as command effect. Most VS Code commands return void and never throw, even when they do nothing. I watched composer.sendToAgent log "success" for dozens of iterations while the Composer stayed unsubmitted. Distinguishing no-op from real effect required runtime observation, not code reading.

The fn key lives below the software layer. AppleScript's key down only understands named modifier constants: command, option, control, shift. The fn key is excluded. Trying key down {63} typed the characters "63" into my editor. The solution was cliclick, a CLI tool that posts real CGKeyboardEvents at the HID level — the same layer as physical hardware.


How I Built It

The extension has three source files.

panel.ts generates the HUD as a self-contained HTML string. A setInterval loop running at ~30Hz polls navigator.getGamepads(), detects rising edges (press) and falling edges (release) on each button, and fires postMessage back to the extension host. Release events are what make hold-to-talk possible — Wispr Flow's fn key stays held as long as L3 is physically held.

extension.ts is the brain. It holds a flat button map, a stateful voiceActive boolean, and three OS-level helpers built on Node's child_process:

  • AppleScript keystroke return — submits Cursor's Composer (a webview the type command can't reach)
  • cliclick kd:fn / ku:fn — holds and releases the fn key for Wispr Flow push-to-talk
  • AppleScript key code 126/125 — up/down arrow keys that work universally in editor, terminal, and chat panels

One non-obvious detail in the voice submit path: after stopping dictation there is a hardcoded 700ms wait before submitting. Cursor's speech-to-text transcription is asynchronous with no completion callback available to extensions. Without the wait, the submit fires against an empty input box. There is no elegant solution — the wait is the solution.

package.json sets activationEvents: ["onStartupFinished"] so the extension is always ready when Cursor opens.


Challenges

Accepting agent changes has no API. Cursor's "Keep All" button in the Agent panel has no exposed command. chatEditing.acceptAllFiles and every variant I found either no-ops silently or only works in specific diff contexts. The Cursor forum has an open feature request for this. R2/L2 work correctly for Cmd+K inline diffs but agent-level accept is a hard wall until Cursor exposes it.

The fn key path. Python's Quartz bindings can also post CGEvents, but macOS marks the system Python as externally managed and blocks pip install. Swift one-liners recompile on every invocation. cliclick (brew install cliclick) was the clean answer: one install, zero overhead, correct behavior.

Timing without callbacks. The 700ms transcription wait is the ugliest line in the codebase. Every other part of the extension is event-driven. This one isn't, because Cursor exposes no hook for "transcription complete." It works reliably in practice, but it's the part I'd fix first if Cursor ever exposes that event.


Button Map

Button Action
✕ X Accept inline suggestion
○ O Hide inline suggestion
□ Square Trigger IntelliSense
△ Triangle New AI Chat
L1 Previous inline suggestion
R1 Next inline suggestion
L2 Reject inline diff
R2 Accept inline diff
Select Toggle Cursor voice dictation
Start Stop dictation + submit to Composer
L3 (hold) Hold fn → Wispr Flow push-to-talk
R3 Toggle terminal
D-pad ▲/▼ Up / Down arrow (universal)
D-pad ◄ Focus code editor
D-pad ► Focus Composer / Chat
PS button Undo
Touchpad Cancel AI generation

Built With

  • TypeScript
  • VS Code Extension API
  • HTML / CSS (HUD webview panel)
  • Web Gamepad API (navigator.getGamepads())
  • AppleScript (osascript) — OS-level keystroke simulation
  • cliclick — HID-level fn key simulation
  • Node.js child_process
  • PlayStation DualSense / DualShock 4 controller
Share this project:

Updates