SignSync is a real-time ASL β Text β Speech desktop app that lets an ASL user βspeakβ inside video meeting platforms (Zoom, Google Meet, Discord, etc.) by translating live signing into audio output through a virtual microphone and video through a virtual camera.
- Devpost: https://devpost.com/software/sign-sync
- Demo video: https://youtu.be/AOC3kynHwjo
SignSync converts ASL gestures into spoken audio in real time and injects it into any meeting platform.
End-to-end pipeline:
- Camera Input: user signs in front of a webcam
- Virtual Camera Layer: meeting apps see the feed as a normal webcam
- ASL Recognition: frames are processed to extract landmarks and classify signs (MediaPipe + TensorFlow)
- NLP Cleanup: recognized words are cleaned into readable sentences (LLM-based grammar repair)
- Text-to-Speech: sentence is synthesized to speech (pyttsx3)
- Virtual Audio Output: audio is routed into a virtual microphone so meeting apps receive it as live speech
asl-text/- ASL recognition pipeline (webcam frames β predicted tokens/words)text-speech/- text cleanup + TTS outputui/- desktop UI / orchestrationspeech.mp3- tts
- Python, PyQt6
- OpenCV (video capture + frame processing)
- MediaPipe Holistic (landmark extraction)
- TensorFlow (gesture classification)
- ZMQ (inter-process messaging)
- PyVirtualCam (virtual webcam output)
- pyttsx3 (offline TTS)
- Virtual Audio Cable (virtual mic routing on Windows)
- Python 3.10+ recommended
- A webcam
- Windows (recommended for the demo setup) with a virtual audio driver (e.g., Virtual Audio Cable)
- A virtual camera sink (handled by
pyvirtualcam)
Create a virtual environment, then install the core deps:
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install opencv-python mediapipe tensorflow pyqt6 pyvirtualcam pyttsx3 pyzmq