This project is a Python-based application that seamlessly translates and generates audio content in real time, just like the world-renowned Babel Fish. It listens to incoming audio, detects speech, identifies the language, transcribes the speech into text, translates the text, and generates audio in the target language.
- Two Options: One Microphone or Two Microphones To use Linux's default microphone, use LiveTranslationOneMic.py
- Real-time Audio Processing: The application continuously listens to audio input and processes it in real time.
- Speech Detection: Utilizes the SpeechRecognition library to detect speech and distinguish spoken content from silence.
- Language Detection: Employs OpenAI Whisper to automatically detect the language of the incoming audio, ensuring accurate language identification.
- Language Transcription Employs OpenAI Whisper to transcribe incoming audio.
- Multilingual Translation: Translates the detected speech to the target language using Lingua, providing clear and effective communication across language barriers.
- Audio Generation: Uses Bark, a text-to-speech synthesis system, to generate audio content based on the translated text.
- Language Mapping: Comprehensive dictionaries and language codes enable language mapping for translation and audio generation.
- Extensible and Customizable: The project's modular design allows for customization and extension to support additional languages and features.
- Python 3.x
- Required Python packages and dependencies (specified in
requirements.text)
-
Clone this repository to your local machine.
-
Install the necessary dependencies by running the following command:
pip install -r requirements.txt
-
Ensure that your machine meets the hardware requirements for running Bark for audio generation.
-
To start the application, run the following command:
python3 LiveTranslationTwoMic.py --whisper_model [model_size] --energy_threshold [threshold] --record_timeout [timeout] --phrase_timeout [timeout]
-
--whisper_model: Specify the Whisper model size (choices: tiny, base, small, medium, large). The default is medium. -
--energy_threshold: Set the energy threshold for microphone detection. -
--record_timeout: Define the real-time recording duration in seconds. -
--phrase_timeout: Set the time gap between recordings to consider it a new line in the transcription.
-
-
Specify your desired microphones
# Create the microphone instances source1 = sr.Microphone(sample_rate=16000, device_index=microphone_index1) source2 = sr.Microphone(sample_rate=16000, device_index=microphone_index2) -
If you are running the script on a computer with a GPU that has less than 12-15 GB of VRAM, enable Bark to run on CPU Note that this is present in the current script
os.environ["SUNO_OFFLOAD_CPU"] = "True" os.environ["SUNO_USE_SMALL_MODELS"] = "True"
Otherwise, enable Bark to run on GPU:
os.environ["SUNO_OFFLOAD_CPU"] = "False" os.environ["SUNO_USE_SMALL_MODELS"] = "False"
-
The application will continuously listen for audio input from the specified microphone source.
-
When speech is detected, the system automatically identifies the language and provides real-time translation.
-
The translated text will be displayed on the console, along with the detected language and its translation.
-
If the detected language supports audio generation, the application will generate and play audio based on the translated text.
-
To start the application, run the following command:
python3 LiveTranslationOneMic.py --whisper_model [model_size] --energy_threshold [threshold] --record_timeout [timeout] --phrase_timeout [timeout]
-
--whisper_model: Specify the Whisper model size (choices: tiny, base, small, medium, large). The default is medium. -
--energy_threshold: Set the energy threshold for microphone detection. -
--record_timeout: Define the real-time recording duration in seconds. -
--phrase_timeout: Set the time gap between recordings to consider it a new line in the transcription.
-
-
If you are running the script on a computer on small GPU's, enable Bark to run on CPU Note that this is present in the current script
os.environ["SUNO_OFFLOAD_CPU"] = "True" os.environ["SUNO_USE_SMALL_MODELS"] = "True"
-
The application will continuously listen for audio input from the specified microphone source.
-
When speech is detected, the system automatically identifies the language and provides real-time translation.
-
The translated text will be displayed on the console, along with the detected language and its translation.
-
If the detected language supports audio generation, the application will generate and play audio based on the translated text.
This project utilizes various open-source libraries and models, including Lingua, OpenAI Whisper, Hugging Face Transformers, Bark, SpeechRecognition, and more. I appreciate the contributions of these projects to the development of this application.
This project is open-source and released under the MIT License.
Philip-David Medows
For inquiries or feedback related to this project, please contact [email protected]


