A full-stack conversational AI project that bridges a Pimoroni Presto physical remote with the ElevenLabs Conversational AI WebSocket API. This setup allows you to control a high-quality AI agent via a touch-sensitive hardware interface while offloading the heavy lifting (audio processing and API communication) to a host server.
The system is split into two specialized components:
The "brain" running on your host computer (Mac/PC/Linux).
- Audio Processing: Uses
soxandnode-record-lpcm16for low-latency recording. - Playback: Uses
speakerfor real-time PCM audio streaming. - Gating Logic: Implements custom RMS-based gating to handle Barge-in and echo cancellation (so the agent doesn't trigger itself).
- API Bridge: Manages the persistent WebSocket connection to ElevenLabs.
- Control Interface: Provides a lightweight HTTP API (
/start,/stop,/status) for the remote.
The "controller" running on a Pimoroni Presto.
- UI: A visual dashboard showing session states:
IDLE,STARTING,ACTIVE, andOFFLINE. - Interaction: Single-touch toggle to start or stop conversations.
- Connectivity: Low-power WiFi communication with the host server.
- Node.js (v18 or higher)
- SoX (Sound eXchange): Required for system-level audio recording.
- macOS:
brew install sox - Linux:
sudo apt-get install sox - Windows: Download binaries and add to PATH.
- macOS:
- ElevenLabs API Key: Available in your ElevenLabs Dashboard.
- Pimoroni Presto (or a similar MicroPython-compatible device with a screen).
git clone https://github.com/your-username/elevenlabs-agent.git
cd elevenlabs-agent
npm installCreate a .env file in the root directory (you can copy .env.example):
cp .env.example .envOpen .env and add your ElevenLabs credentials:
ELEVENLABS_AGENT_ID: Your Agent ID.ELEVENLABS_API_KEY: Your ElevenLabs API Key.
Create a secrets.py file (you can use secrets.py.example as a template) and upload it to your Presto:
WIFI_SSID: Your WiFi network name.WIFI_PASS: Your WiFi password.AGENT_SERVER_IP: The local IP address of your host computer.
Use Thonny or mpremote to flash main.py and secrets.py onto your Pimoroni Presto.
-
Start the Host Server:
node server.js
You should see
Server listening on http://0.0.0.0:8080. -
Power on the Presto: It will connect to WiFi and display the
IDLEstate. Tap the screen to begin a conversation!
If the agent is too sensitive or doesn't hear you over its own voice, tweak these in server.js:
THRESHOLD_IDLE: Sensitivity when the room is quiet.THRESHOLD_BARGE: Sensitivity required to interrupt the agent while it is speaking.
Defaults to 16000Hz mono, 16-bit PCM. This is the optimal format for ElevenLabs Conversational AI.
Forks and Pull Requests are welcome!
- Fork the Repo.
- Create a Feature Branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the Branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Distributed under the MIT License. See LICENSE for more information. (Note: Ensure you include a LICENSE file if forking).
- ElevenLabs for the Conversational AI API.
- Pimoroni for the excellent Presto hardware.