Viseme uses the standard HTML5 Voice Recognition API for basic speech to text functionality. When ambient noise exceeds the threshold where voice recognition becomes no longer accurate, Viseme switches to device camera for lip reading. The video is streamed to Viseme machine learning engine running a neural network; recognized text is sent back to the frontend to be displayed as subtitles. Once the noise level drops, the system reverts back to audio-only speech-to-text recognition.
This repository was archived by the owner on Sep 22, 2020. It is now read-only.
Viseme/server
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|