Skip to content

loganngarcia/chaplin-ui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Chaplin-UI 🎬

chaplin-ui

Visual Speech Recognition - Read lips, transcribe speech, all locally

License: MIT Python 3.12+

A beautiful, open-source tool that reads your lips in real-time and transcribes silently mouthed speech using local ML models.

Quick Start β€’ Contributing β€’ Privacy β€’ Security β€’ Documentation β€’ License


✨ What is Chaplin-UI?

Chaplin-UI is a gentle, privacy-focused tool that reads lips and turns them into text. Simply record yourself speaking (or upload a video), and watch as your words appear on screenβ€”all without making a sound. This project is based on Chaplin by Amanvir Parhar, with added web interface and UI improvements. The VSR model achieves 19.1% word error rate (WER) on LRS3. Perfect for:

  • 🎀 Silent communication - Type without speaking, or transcribe existing videos
  • πŸ”’ Privacy-first - Everything runs locally on your machine (Privacy Policy)
  • 🌐 Web-based - Works in any modern browserβ€”no installation needed
  • 🎨 Beautiful UI - Clean, calming design that adapts to your system theme

πŸ’™ Who It's For

I built this after a week of laryngitisβ€”when I couldn't speak, I needed a way to communicate. If you've ever wanted to say something without making a sound, Chaplin-UI might help:

  • Public places β€” Libraries, offices, late-night calls, or anywhere you want to stay quiet
  • Deaf and hard-of-hearing β€” Mouth words to communicate when sign language isn't shared
  • Medical conditions β€” ALS, aphonia, cerebral palsy, laryngectomy, Parkinson's, vocal cord paralysis, selective mutism
  • Temporary voice loss β€” Laryngitis, recovery from throat surgery, or vocal strain
  • Privacy β€” Situations where you'd rather not speak aloud but still need to get your words out

Apple just acquired a silent-speech company (Q.ai) for $2 billionβ€”this space matters, and open-source tools like this keep the technology accessible.

πŸš€ Quick Start

Prerequisites

  • Python 3.12+ (check with python3 --version)
  • LLM local server – Ollama or LM Studio (the app finds one you have running):
    • Ollama – ollama serve + ollama pull <model>
    • LM Studio – load model, enable Local Server (port 1234)
  • Modern web browser with camera access

Installation

  1. Clone the repository:

    git clone https://github.com/loganngarcia/chaplin-ui.git
    cd chaplin-ui
  2. Set up Python environment:

    python3 -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
  3. Download model files:

    ./setup.sh

    This downloads the VSR model from Hugging Face (~500MB).

  4. Start your LLM server (pick one: Ollama or LM Studio):

    Option A – Ollama

    ollama serve        # usually starts automatically
    ollama pull llama3.2   # or mistral, llama2, etc.

    Option B – LM Studio

    • Open LM Studio
    • Load a model (we recommend zai-org/glm-4.6v-flash)
    • Go to Developer tab β†’ Enable Local Server (port 1234)
  5. Run the web app:

    ./run_web.sh

    The UI opens in ~1 second at http://localhost:8000. The model loads in the background (~30–60 sec first time); you can use the interface right awayβ€”buttons enable when ready.

  6. Start transcribing:

    • Record live: Click "Start recording" to capture video from your camera
    • Upload a video: Click "Upload video" to transcribe an existing video file
    • Your transcription appears in both raw and corrected formats. (You can change the LLM in settings if needed.)

    That's it! The app handles everything else for you.

πŸ“– Documentation

Project Structure

chaplin-ui/
β”œβ”€β”€ chaplin_ui/              # Core shared modules
β”‚   └── core/                # Shared utilities, models, configs
β”‚       β”œβ”€β”€ models.py        # Pydantic data models
β”‚       β”œβ”€β”€ constants.py     # All configuration constants
β”‚       β”œβ”€β”€ llm_client.py    # LLM API wrapper
β”‚       β”œβ”€β”€ video_processor.py # Video processing utilities
β”‚       └── ...
β”œβ”€β”€ web/                     # Web app frontend
β”‚   β”œβ”€β”€ index.html          # Main HTML
β”‚   β”œβ”€β”€ style.css           # Styles (Apple HIG)
β”‚   └── app.js              # Frontend logic
β”œβ”€β”€ web_app.py              # FastAPI backend server
β”œβ”€β”€ chaplin.py              # CLI implementation
β”œβ”€β”€ main.py                 # CLI entry point
└── pipelines/              # VSR model pipeline

How It Works

Simply put, Chaplin-UI watches how your lips move and turns that into text. Here's what happens behind the scenes:

  1. You provide video - Either record yourself speaking or upload an existing video file
  2. Face detection - The app finds and tracks your face in the video
  3. Lip reading - A trained model watches your lip movements and creates initial text
  4. Text refinement - An AI language model cleans up the text, adds punctuation, and fixes any mistakes
  5. You see results - Both the raw transcription and the polished version appear on screen

You can copy the corrected text with one click, or review the raw output to see what the lip-reading model detected.

LLM Providers: Ollama vs LM Studio

Chaplin-UI supports two local LLM backends. Both use OpenAI-compatible APIs:

Provider Default URL Default Model Setup
Ollama http://localhost:11434/v1 llama3.2 ollama serve then ollama pull <model>
LM Studio http://localhost:1234/v1 local Load model, enable Local Server in Developer tab
  • Web app: Select provider in the "LLM Provider" dropdown and optionally override the model name.
  • CLI: Use llm_provider=ollama or llm_provider=lmstudio, or run with --config-name ollama for Ollama defaults.

Key Components

  • chaplin_ui/core/ - Shared code used by CLI and Web interfaces
  • web_app.py - FastAPI server handling video uploads and processing
  • chaplin.py - CLI version with keyboard typing
  • pipelines/ - VSR model inference pipeline

πŸ› οΈ Development

Running Locally

Web App:

source .venv/bin/activate
python web_app.py

CLI:

source .venv/bin/activate
python main.py config_filename=./configs/LRS3_V_WER19.1.ini detector=mediapipe
# With Ollama:
python main.py --config-name ollama
# Or: python main.py llm_provider=ollama llm_model=mistral

Code Style

We follow Python best practices:

  • Type hints on all functions
  • Docstrings (Google style) for all public functions
  • Logging instead of print statements
  • Constants centralized in chaplin_ui/core/constants.py

Testing

# Test imports
python -c "from chaplin_ui.core import *; print('βœ“ All imports work')"

# Test web app
python web_app.py &
curl http://localhost:8000/api/health

🀝 Contributing

We love contributions! Whether it's:

  • πŸ› Bug fixes
  • ✨ New features
  • πŸ“ Documentation improvements
  • 🎨 UI/UX enhancements
  • πŸ”§ Code refactoring

See our Contributing Guide for details on:

  • How to set up your development environment
  • Code style guidelines
  • How to submit pull requests
  • Where to ask questions

First time contributing? Check out our good first issues!

πŸ“ License

This project is licensed under the MIT License - see LICENSE for details.

πŸ™ Acknowledgments

Original Creator

Chaplin-UI is based on Chaplin by Amanvir Parhar. We're grateful for the original work that made this project possible!

Additional Credits

  • VSR Model: Based on Auto-AVSR by mpc001 (19.1% WER on LRS3)
  • Dataset: Lip Reading Sentences 3
  • LLM: Uses Ollama or LM Studio for local text correction (both OpenAI-compatible)

πŸ’¬ Community


Made with ❀️ by the open source community

⭐ Star us on GitHub β€’ πŸ“– Read the docs β€’ 🀝 Contribute

About

Web interface for a real-time silent speech recognition tool.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors