The Agentic Hardware KVM
KaiVM turns a Raspberry Pi into an autonomous AI agent that can physically control any computer. By acting as a USB Keyboard/Mouse and capturing video output via HDMI, KaiVM can see, reason, and interact with target machines regardless of their operating system or state—even in BIOS or during a crash.
Powered by Google Gemini 3, KaiVM bridges the gap between multimodal AI and physical hardware.
- Universal Control: Works on Windows, Linux, macOS, and bare-metal servers. No software installation is required on the target machine.
- Visual Reasoning: Uses Gemini 3's multimodal capabilities to analyze the screen in real-time.
- Absolute Mouse Precision: Implements a custom USB HID descriptor (Digitizer) to map screen coordinates 1:1, eliminating AI mouse drift and "hallucinated" clicks.
- Autonomous Recovery: Can navigate BIOS menus, bootloaders (GRUB), and recovery screens where software agents fail.
- Event Watchdog: Define visual triggers (e.g., "If error popup appears") to execute actions automatically.
KaiVM acts as a "Man-in-the-Middle" device. It captures video output and injects USB input.
- Video Flow: The target computer sends HDMI video to the Capture Card. The Server processes this raw video into screenshots with a grid overlay and sends them to the Gemini 3 API.
- Logic Flow: Gemini 3 analyzes the screenshot and returns a JSON Action Plan.
- Control Flow: The Server translates the plan into HID commands (Keystrokes/Clicks) which are sent via the USB Gadget interface back to the target computer.
To build a KaiVM unit, you need:
- Raspberry Pi 4: The Pi 4 is required for its specific USB-C OTG capabilities (
dwc2). - HDMI-to-USB Capture Card: Any generic dongle based on the MacroSilicon MS2109 chip (available cheaply online).
- USB-C Splitter (Power + Data): Essential for the Pi 4. You need a cable that splits the USB-C port into a Power input and a Data (USB-A) output to connect to the target PC.
- Target Computer: The machine you wish to control.
- (Optional) SPI Display: Supports ST7789-based 2-inch IPS displays for local status monitoring.
Flash a fresh image of Raspberry Pi OS Lite (64-bit).
Clone the repository and run the installer script as root. This script handles dependencies (ffmpeg, python-venv), configures the Linux Kernel USB Gadget modules, and sets up systemd services.
git clone https://github.com/yourusername/kaivm.git
cd kaivm
sudo ./install.shNote: The installer will modify /boot/firmware/config.txt to enable USB OTG (dtoverlay=dwc2) and SPI. A reboot is required after installation.
The installer will prompt you for your Gemini API Key. If you skip this, you can manually edit the environment file later:
sudo nano /etc/default/kaivm
# Add: GEMINI_API_KEY=your_key_hereOnce installed and rebooted, access the dashboard at:
http://<raspberry-pi-ip>:8000
- View Stream: Low-latency MJPEG stream of the target PC.
- Chat/Instruct: Type natural language commands (e.g., "Open the browser and find the weather in Tokyo").
- Scheduler: Set up recurring tasks (e.g., "Check disk space every morning").
KaiVM uses an absolute mouse mode, but different operating systems scale coordinates differently. If clicks are inaccurate:
- Run
kaivm calibrate(or use the Calibrate button in the Web UI). - The system will move the mouse to corners, detect the cursor position using visual diffing, and calculate the scaling factor.
- Results are saved to
~/.config/kaivm/calibration.txt.
install.sh: Main setup script. Configures Kernel, udev rules, and Python env.kaivm/agent/: Core logic for the AI loop (Reasoning -> Action).kaivm/hid/: Low-level USB gadget drivers (Keyboard, Relative Mouse, Absolute Mouse).kaivm/gemini/: Client for Google Gemini 3 API, including prompt engineering and image preprocessing (Grid Overlay).kaivm/server.py: FastAPI backend and WebSocket handler.scripts/: Systemd unit files and hardware initialization scripts.
- "USB Device Not Recognized" on Target: Ensure you are using a data-capable USB-C splitter. The Pi must be powered and have data lines connected to the target.
- Video Feed is Black: Check if the target PC is sleeping. Ensure the HDMI capture card is recognized (
ls /dev/video*). - Mouse Clicks are Off: Run the calibration tool. Ensure the target OS resolution is standard (1920x1080 is recommended for best OCR results).