Skip to content

wizeline/churchill-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speak with Churchill

A real-time voice AI that lets you have a conversation with Winston Churchill about his writings. Ask him anything about painting, creativity, or the life of a statesman — he'll answer in his own words, in his own voice.


What it does

You open a web app, pick one of Churchill's essays, click a button, and start talking. Churchill responds out loud, in character, with his characteristic wit and eloquence. The conversation is fully voice-driven — no typing involved.

Under the hood, your voice goes through a three-stage AI pipeline:

Your voice → Deepgram (transcription) → Gemini (Churchill's brain) → ElevenLabs (Churchill's voice) → Your speakers

Requirements

  • Docker Desktop
  • API keys for:
  • ngrok (optional, for sharing the URL externally)

Getting started

1. Clone and configure

git clone <repo-url>
cd churchill-ai
cp apps/backend/.env.example apps/backend/.env

Edit apps/backend/.env and fill in your API keys:

DEEPGRAM_API_KEY=...
GEMINI_API_KEY=...
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...

2. Run

docker compose -f infrastructure/docker-compose.yml up --build

Open http://localhost:3000 — that's it.

3. Share with others (optional)

If you want to give someone else access to your local instance:

./share.sh

This opens two ngrok tunnels (frontend + backend) and prints a single shareable URL:

https://abc123.ngrok-free.app?ws=wss://xyz456.ngrok-free.app/ws/voice

Anyone with that link can use the app from their browser. The ?ws= param tells the frontend which backend to connect to.

Requirements: ngrok installed and authenticated (ngrok authtoken <your-token>)


Keeping it running

If you need the app live for a demo without being at your keyboard:

caffeinate -is   # prevents sleep even with lid closed (requires power + external display)

Or just lock your screen with Cmd + Ctrl + Q — processes keep running when the screen is locked, not when you log out.


Project structure

churchill-ai/
├── apps/
│   ├── frontend/          # Next.js 15 web app
│   │   └── src/
│   │       ├── app/
│   │       │   └── page.tsx          # Article selector
│   │       └── components/
│   │           └── VoiceAssistant.tsx # WebSocket client + audio I/O
│   └── backend/           # FastAPI server
│       └── src/
│           ├── main.py               # WebSocket endpoint /ws/voice
│           ├── articles/             # Article content
│           ├── config/
│           │   ├── prompts.py        # Churchill system prompt
│           │   └── settings.py       # Env vars + defaults
│           └── services/
│               └── assistant_factory.py  # Wires up the AI pipeline
├── packages/
│   └── ccai/              # Internal voice assistant framework
│       └── ccai/core/
│           ├── speech_to_text/       # Deepgram STT
│           ├── llm/                  # Gemini LLM
│           ├── text_to_speech/       # ElevenLabs TTS
│           ├── brain/                # Orchestrates STT → LLM → TTS
│           └── voice_assistant/      # Top-level assistant logic
├── infrastructure/
│   └── docker-compose.yml
├── share.sh               # ngrok tunnel script
└── ARCHITECTURE.md        # Diagrams and technical deep-dive

How the voice pipeline works

Each conversation turn follows this flow:

  1. Browser captures microphone audio at 16kHz and streams it as base64 JSON over WebSocket to the backend
  2. Deepgram receives the raw PCM audio and returns a transcript in real time, using 200ms silence detection to detect when you stop speaking
  3. Gemini 2.5 Flash Lite receives the transcript plus the full conversation history and a system prompt that instructs it to respond as Churchill, grounded in the selected article
  4. ElevenLabs converts Gemini's response to PCM audio using a Churchill voice clone, streaming chunks back as they're generated
  5. Browser receives the audio chunks and plays them back seamlessly via the Web Audio API

The entire round-trip is streamed end-to-end — the response starts playing before Gemini has finished generating it.


Adding a new article

  1. Create a new file in apps/backend/src/articles/:
# apps/backend/src/articles/my_article.py
MY_ARTICLE = {
    "id": "my-article-slug",
    "title": "My Article Title",
    "content": "Full text of the article...",
}
  1. Register it in apps/backend/src/articles/__init__.py

  2. Add it to the ARTICLES array in apps/frontend/src/app/page.tsx


Tech stack

Layer Technology
Frontend Next.js 15, React 18, TypeScript, Tailwind CSS
Backend FastAPI, Python 3.11, uvicorn
STT Deepgram (streaming, 16kHz PCM)
LLM Google Gemini 2.5 Flash Lite
TTS ElevenLabs (streaming PCM)
Infrastructure Docker Compose
Tunneling ngrok

Architecture

For diagrams and a deeper technical breakdown, see ARCHITECTURE.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors