Voice AI infrastructure
that scales
Layercode handles real-time audio transport, speech-to-text, text-to-speech, and global edge deployment. You handle your agent's logic with any LLM, any framework.
Layercode handles real-time audio transport, speech-to-text, text-to-speech, and global edge deployment. You handle your agent's logic with any LLM, any framework.
You receive transcribed text from your users. You send text back. Layercode handles everything in between: audio capture, speech-to-text, text-to-speech, WebSocket connections, and global delivery.
Audio → Layercode
Text → Your Backend → Text
Layercode → Audio
A complete voice agent implementation. Server-side streaming with the Vercel AI SDK, client-side audio handling with our React hook.
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
import { streamResponse } from "@layercode/node-server-sdk";
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY! });
export const POST = async (request: Request) => {
const body = await request.json();
return streamResponse(body, async ({ stream }) => {
if (body.type === "message") {
const { textStream } = streamText({
model: openai("gpt-4o-mini"),
system: "You are a helpful voice assistant.",
messages: [{ role: "user", content: body.text }],
onFinish: () => stream.end(),
});
await stream.ttsTextStream(textStream);
}
});
};Every voice conversation flows through a five-stage pipeline optimized for sub-second round-trip latency.
Audio captured from browser, mobile app, or phone. Streamed to nearest edge location via WebSocket.
WebSocket connection with automatic reconnection and jitter buffering
Real-time transcription using Deepgram. Voice Activity Detection (VAD) handles turn-taking automatically.
Sub-50ms processing latency at 330+ global edge locations
Transcribed text sent to your webhook via HMAC-signed POST request. You control the LLM and logic.
Works with any backend: Next.js, Express, FastAPI, Rails, etc.
Use our SDK to stream text back. We handle text-to-speech conversion in real-time.
Compatible with Vercel AI SDK, LangChain, and any streaming LLM
Audio synthesized and streamed back to the user. Word-level timestamps enable precise interruption handling.
Choose from ElevenLabs, Cartesia, or Rime for TTS
First-class TypeScript and Python support. Client SDKs for React and vanilla JavaScript. Server SDKs for Node.js and Python.
useLayercodeAgent hook for managing voice sessions, speaking states, and audio visualization.
npm install @layercode/react-sdkFramework-agnostic client SDK for any JavaScript environment.
npm install @layercode/client-sdkServer-side SDK for handling webhooks and streaming responses.
npm install @layercode/node-server-sdkPython SDK for FastAPI, Flask, Django, and other Python backends.
pip install layercodeChoose the voice provider that fits your use case. Switch providers with a single config change — no code modifications required.
mistv2
Best for: Quick start & managed billing
sonic-2
Best for: High-fidelity & detailed timestamps
eleven_v2_5_flash
Best for: Cloned voices & multilingual
inworld-tts
Best for: Gaming & interactive characters
flux & nova-3
Best-in-class speech recognition: Flux is our recommended model for real-time voice pipelines—purpose-built for streaming with unmatched speed and accuracy. We also support Nova-3 for use cases requiring proven reliability.
Best for: Real-time voice pipelines
Other voice AI platforms run on centralized cloud infrastructure. When your user is in Tokyo and your servers are in Virginia, latency kills the conversation.
Layercode runs on Cloudflare Workers. Audio processing happens at the edge location nearest to your user, not in a distant data center.
Switch between Rime, Cartesia, ElevenLabs, and Inworld with a single config change. No code changes required.
Replay conversations, inspect latency breakdowns, view transcripts, and debug production issues with full visibility.
Every call recorded automatically. Download audio, export transcripts, build training datasets. All stored securely.
Pay only for active conversation time. Silence is always free. No minimum commitments or hidden fees.
Web browsers, iOS, Android, and phone (via Twilio). Same backend, same pipeline, multiple channels.
Inbound and outbound calling via Twilio. Full call recording and transcript analysis included.
Build agents that execute functions. Works with Vercel AI SDK, LangChain, LlamaIndex, and CrewAI.
Transfer between agents mid-conversation. Build complex workflows with specialized agents.
SOC 2 Type II compliant*, GDPR compliant, TLS 1.3 encryption, AES-256 at rest.
Latency is the enemy of natural voice interactions. Layercode is engineered to minimize time-to-first-token at every stage.
Gemini Flash 2.5-lite and gpt-4o-mini are optimized for speed. Avoid reasoning-extended models — they trade large amounts of latency for marginal quality gains in spoken conversations.
Emit response.tts events like "Let me look that up" before heavy processing begins. Users hear immediate audio while your backend works.
Running retrieval on every turn adds network hops and stalls conversations. Fetch external data only when needed.
Store conversation history in fast, nearby databases like Redis. Collocate services with Layercode deployments to minimize cross-region latency.
Layercode is built for production workloads with enterprise security requirements. Your data is encrypted in transit and at rest. Session recordings are stored securely in SOC 2 compliant infrastructure.
Per-second billing for active conversation time. Silence is free. STT, TTS, and infrastructure costs consolidated into one simple rate. Start with $100 in free credits, no credit card required.
View pricing detailsGet started with $100 in free credits. No credit card required.