$2,000 free creditsApply now
Layercode
Powered by Cloudflare's global edge network

Voice AI infrastructure
that scales

Layercode handles real-time audio transport, speech-to-text, text-to-speech, and global edge deployment. You handle your agent's logic with any LLM, any framework.

Text in. Text out.

You receive transcribed text from your users. You send text back. Layercode handles everything in between: audio capture, speech-to-text, text-to-speech, WebSocket connections, and global delivery.

User speaks

Audio → Layercode

You process text

Text → Your Backend → Text

User hears response

Layercode → Audio

Getting Started

Add voice to your app in under 50 lines

A complete voice agent implementation. Server-side streaming with the Vercel AI SDK, client-side audio handling with our React hook.

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";
import { streamResponse } from "@layercode/node-server-sdk";

const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY! });

export const POST = async (request: Request) => {
  const body = await request.json();

  return streamResponse(body, async ({ stream }) => {
    if (body.type === "message") {
      const { textStream } = streamText({
        model: openai("gpt-4o-mini"),
        system: "You are a helpful voice assistant.",
        messages: [{ role: "user", content: body.text }],
        onFinish: () => stream.end(),
      });

      await stream.ttsTextStream(textStream);
    }
  });
};
Architecture

How the pipeline works

Every voice conversation flows through a five-stage pipeline optimized for sub-second round-trip latency.

1.

User speaks

Audio captured from browser, mobile app, or phone. Streamed to nearest edge location via WebSocket.

WebSocket connection with automatic reconnection and jitter buffering

2.

Speech-to-text at the edge

Real-time transcription using Deepgram. Voice Activity Detection (VAD) handles turn-taking automatically.

Sub-50ms processing latency at 330+ global edge locations

3.

Webhook to your backend

Transcribed text sent to your webhook via HMAC-signed POST request. You control the LLM and logic.

Works with any backend: Next.js, Express, FastAPI, Rails, etc.

4.

Stream your response

Use our SDK to stream text back. We handle text-to-speech conversion in real-time.

Compatible with Vercel AI SDK, LangChain, and any streaming LLM

5.

User hears response

Audio synthesized and streamed back to the user. Word-level timestamps enable precise interruption handling.

Choose from ElevenLabs, Cartesia, or Rime for TTS

Developer Experience

SDKs for every stack

First-class TypeScript and Python support. Client SDKs for React and vanilla JavaScript. Server SDKs for Node.js and Python.

React SDK

useLayercodeAgent hook for managing voice sessions, speaking states, and audio visualization.

npm install @layercode/react-sdk
View docs
  • Session lifecycle management
  • Real-time speaking states
  • Audio amplitude for visualizations
  • Mute/unmute controls

Vanilla JS SDK

Framework-agnostic client SDK for any JavaScript environment.

npm install @layercode/client-sdk
View docs
  • Browser & mobile support
  • WebSocket management
  • Event-driven API
  • TypeScript support

Node.js SDK

Server-side SDK for handling webhooks and streaming responses.

npm install @layercode/node-server-sdk
View docs
  • streamResponse() helper
  • HMAC signature verification
  • TTS text streaming
  • Tool call support

Python SDK

Python SDK for FastAPI, Flask, Django, and other Python backends.

pip install layercode
View docs
  • Async/await support
  • Framework agnostic
  • Streaming responses
  • Type hints included

Works with the LLM libraries you already use

Vercel AI SDK
OpenAI
Anthropic
LangChain
Ollama
Mastra
Voice Providers

Hot-swap text-to-speech providers

Choose the voice provider that fits your use case. Switch providers with a single config change — no code modifications required.

Text to Speech

Managed

Rime

mistv2

  • Zero setup required
  • Managed by Layercode
  • PCM, MP3, μ-law formats
  • Streaming timestamps

Best for: Quick start & managed billing

Cartesia

sonic-2

  • 16kHz PCM streaming
  • Word-level timestamps
  • High-fidelity voices
  • Precise interruption handling

Best for: High-fidelity & detailed timestamps

ElevenLabs

eleven_v2_5_flash

  • Voice cloning support
  • Multilingual capabilities
  • Stability controls
  • Character-level alignment

Best for: Cloned voices & multilingual

Managed

Inworld

inworld-tts

  • Character-driven voices
  • Emotional expression
  • Game & entertainment focus
  • Real-time streaming

Best for: Gaming & interactive characters

Speech to Text

Deepgram

flux & nova-3

Best-in-class speech recognition: Flux is our recommended model for real-time voice pipelines—purpose-built for streaming with unmatched speed and accuracy. We also support Nova-3 for use cases requiring proven reliability.

Ultra-low latency
Streaming-first architecture
Noise-robust accuracy
Realtime word timestamps

Best for: Real-time voice pipelines

Infrastructure

Built for low-latency at global scale

Other voice AI platforms run on centralized cloud infrastructure. When your user is in Tokyo and your servers are in Virginia, latency kills the conversation.

Layercode runs on Cloudflare Workers. Audio processing happens at the edge location nearest to your user, not in a distant data center.

Powered by Cloudflare Workers
330+
Edge locations worldwide
<50ms
Audio processing latency
Zero
Cold starts
100%
Session isolation
Features

Everything you need for production-ready voice AI agents

Hot-swap voice providers

Switch between Rime, Cartesia, ElevenLabs, and Inworld with a single config change. No code changes required.

Analytics & observability

Replay conversations, inspect latency breakdowns, view transcripts, and debug production issues with full visibility.

Session recording

Every call recorded automatically. Download audio, export transcripts, build training datasets. All stored securely.

Per-second billing

Pay only for active conversation time. Silence is always free. No minimum commitments or hidden fees.

Multi-channel support

Web browsers, iOS, Android, and phone (via Twilio). Same backend, same pipeline, multiple channels.

Telephony integration

Inbound and outbound calling via Twilio. Full call recording and transcript analysis included.

Tool calling

Build agents that execute functions. Works with Vercel AI SDK, LangChain, LlamaIndex, and CrewAI.

Multi-agent orchestration

Transfer between agents mid-conversation. Build complex workflows with specialized agents.

Enterprise security

SOC 2 Type II compliant*, GDPR compliant, TLS 1.3 encryption, AES-256 at rest.

Performance

Optimized for natural conversations

Latency is the enemy of natural voice interactions. Layercode is engineered to minimize time-to-first-token at every stage.

Use low-TTFT models

Gemini Flash 2.5-lite and gpt-4o-mini are optimized for speed. Avoid reasoning-extended models — they trade large amounts of latency for marginal quality gains in spoken conversations.

Speech priming

Emit response.tts events like "Let me look that up" before heavy processing begins. Users hear immediate audio while your backend works.

Optimize RAG patterns

Running retrieval on every turn adds network hops and stalls conversations. Fetch external data only when needed.

Colocate your infrastructure

Store conversation history in fast, nearby databases like Redis. Collocate services with Layercode deployments to minimize cross-region latency.

Enterprise-ready security

Layercode is built for production workloads with enterprise security requirements. Your data is encrypted in transit and at rest. Session recordings are stored securely in SOC 2 compliant infrastructure.

SOC 2 Type II*
GDPR Compliant
TLS 1.3
AES-256

Simple, predictable pricing

Per-second billing for active conversation time. Silence is free. STT, TTS, and infrastructure costs consolidated into one simple rate. Start with $100 in free credits, no credit card required.

View pricing details

Ready to build?

Get started with $100 in free credits. No credit card required.