A Telegram bot for multilingual voice and text translation, in-chat dictionary lookups with bilingual definitions, and learning tools (pronunciation practice with language-specific models, word statistics, performance metrics).
The bot is the main interface. It provides:
-
Translation β Voice note or text β WhisperX transcription β Google Translate β XTTS v2 voice-cloned audio. Language picker, "Reply in X" for conversational flow, speed presets (0.5x / 1x / 2x). Performance metrics available in debug mode.
-
Dictionary β Bilingual Wiktionary lookups with definitions in both English and native language (e.g., French + English for French words), etymology, examples, and word-form buttons (complete conjugation tables for verbs with all tenses and persons, plural for nouns, comparative/superlative for adjectives). Pronunciation audio, etymology, practice pronunciation with language-specific models, and Smart Synonyms (CEFR) where supported.
-
Learning β Event storage, word stats, multi-language pronunciation scoring (Wav2Vec2 + DTW with language-specific models), and
/statsfor progress. Pronunciation scoring uses correct phoneme models per language (French words scored with French phonemes, not English). Pronunciation feedback includes pair-based articulation tips β e.g. if you say 'B' instead of 'V', you get specific guidance on that exact substitution. -
Performance β Comprehensive latency tracking and metrics available in debug mode across all major components (transcription, translation, synthesis, pronunciation scoring). Identifies bottlenecks and timing breakdowns.
Under the hood it uses: speech_to_speech (WhisperX, Google Translate, XTTS with latency metrics), voice_transformer (speed/age/gender presets), wiktionary_client (mwparserfromhell, bilingual lookups, Telegram-safe formatting), learning (SQLite, aggregations), ml/pronunciation_score (multi-language Wav2Vec2 models, language-specific IPA extraction, pair-based phoneme correction tips).
Language support varies by feature depending on the underlying services and models used.
| Feature | Languages |
|---|---|
| π¬ Text translation | Any language (Google Translate) |
| π Voice-to-voice translation | Spanish, French, Italian, Portuguese, German, English, Dutch, Czech, Polish, Russian, Hungarian, Arabic, Turkish, Hindi, Japanese, Korean, Mandarin (Simplified & Traditional) |
| π€ Pronunciation scoring | English, French, Spanish, German, Italian, Portuguese, Russian, Polish, Japanese, Mandarin, Arabic, Turkish, Dutch |
| π Smart Synonyms (CEFR) | English, German, French, Spanish, Italian, Portuguese, Dutch, Russian, Mandarin, Japanese, Korean, Arabic |
| π Voice effects | Any language |
| π Dictionary & Etymology | Best coverage for European languages (Wiktionary) |
- README.md β This file
- CHANGELOG.md β Version history and changes
- Dockerfile β Docker image for CI/CD
- environment.yml β Conda environment spec (used by Docker)
- pytest.ini β Pytest configuration (asyncio mode)
- telegram_bot.py β Bot entry point and routing
- speech_to_speech.py β Voice/text translation with performance metrics (WhisperX, Translate, XTTS)
- voice_transformer.py β Speed/age/gender voice effects
- latiniser.py β Latin script conversion for non-Latin languages
- handlers.py β Commands and message handlers (translate, dictionary, pronunciation practice)
- callbacks.py β Button callbacks (language selection, word forms, pronunciation, etymology, etc.)
- keyboards.py β Inline keyboard layouts with universal Home button
- config.py β Languages and bot configuration
- utils.py β Utility functions (speed adjustment, etc.)
- wiktionary_client.py β Bilingual definitions (English + native), etymology, examples, word-forms keyboard, improved template parsing
- corpus_examples.py β Sentence examples from corpora
- cefr.py β CEFR difficulty classification / Smart Synonyms
- word_forms_extractor.py β Complete conjugation tables (all tenses, all persons), plural forms, comparative/superlative forms
- storage.py β SQLite for learning events
- events.py β Event models
- aggregations.py β Statistics and trends
- pronunciation_score.py β Multi-language Wav2Vec2-based pronunciation scoring with language-specific models, IPA extraction, pair-based phoneme correction tips, comprehensive performance metrics
22-test pytest suite covering: Levenshtein distance, phoneme similarity, feedback generation, Telegram handlers (start, set language), database initialisation, and module imports.
- Pipeline demo β Optional:
python legacy/demo/demo.py(file or mic β processing β playback), if that script is present.
- Add
TELEGRAM_BOT_TOKEN=...to a.envfile at the project root (or set the env var). - Start the bot:
python src/telegram_bot.pyor
python -m src.telegram_bot- Enable Debug Mode (optional) for detailed performance metrics:
translator = SpeechToSpeechTranslator(debug=True)
scorer = PronunciationScore(language="fr", debug=True)Notes: WhisperX and XTTS are lazy-loaded (first use may be slower). Language-specific Wav2Vec2 models download on first use per language. You need network access for Wiktionary and translation APIs.
The project uses GitHub Actions for automated testing on every push and pull request. Tests run inside a Docker container that mirrors the local conda environment, avoiding platform-specific dependency issues.
The CI pipeline:
- Restores cached Docker layers (keyed on
environment.ymlβ only rebuilds when dependencies change) - Builds Docker image with full conda environment if cache miss
- Runs 22-test pytest suite
- Reports pass/fail status per commit
First run after a dependency change takes ~10 minutes to rebuild. All other pushes complete in ~30 seconds thanks to layer caching.
View test results: Go to the "Actions" tab on GitHub after pushing code.
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=src --cov-report=term-missing
# Run specific test file
pytest tests/test_bot.py -v- Install dependencies
conda env create -f environment.yml
conda activate accent-soft- Install espeak-ng (required for IPA phoneme extraction):
# macOS
brew install espeak-ng
# Linux
sudo apt-get install espeak-ng- Run tests
pytest tests/- Run the demo (optional, if present)
python legacy/demo/demo.py- Push code β Tests run automatically via GitHub Actions
git add .
git commit -m "Your changes"
git push origin mainCheck the "Actions" tab on GitHub to see test results.
Key packages: python-telegram-bot, python-dotenv, whisperx, TTS (XTTS), deep_translator, mwparserfromhell, gtts, soundfile, librosa, torch, transformers (Wav2Vec2), fastdtw, inflect. See environment.yml.
Note: whisperx is excluded from the CI Docker environment due to irresolvable numpy version conflicts with other ML packages. It is used and tested locally.
The pronunciation scorer uses language-specific models and pair-based correction tips:
- Language-specific Wav2Vec2 models: French words scored with French phoneme recognition, Spanish with Spanish, etc.
- 13+ languages supported: English, French, Spanish, German, Italian, Portuguese, Russian, Polish, Japanese, Chinese, Arabic, Turkish, Dutch
- Pair-based articulation tips: feedback is specific to what was heard vs. what was expected β e.g. "You said 'B' but the target is V β your upper teeth should lightly touch your lower lip"
- Language-specific IPA extraction: Uses espeak-ng with correct language voices
Example:
from src.ml.pronunciation_score import score_user_pronunciation
result = score_user_pronunciation(
user_audio_bytes,
"bonjour",
language="fr",
debug=True
)
print(result['overall_score'], result['feedback'])Dictionary lookups show definitions in both English and the native language:
- English Wiktionary definitions (reliable, comprehensive)
- Native language definitions (e.g., French definitions from fr.wiktionary.org)
- Flag emojis for visual distinction (π¬π§ English, π«π· French, etc.)
- Automatic fallback if native definitions unavailable
Word form buttons display complete conjugation tables like Google Translate:
- All persons: je/tu/il/nous/vous/ils (French), yo/tΓΊ/Γ©l/nosotros/vosotros/ellos (Spanish), etc.
- All major tenses: Present, Future, Imperfect, PassΓ© Simple, Conditional, Subjunctive
- Supports French, Spanish, Italian, Portuguese, Romanian
Both translation and pronunciation scoring include comprehensive latency tracking in debug mode:
Speech-to-Speech Translation:
π COMPLETE PIPELINE METRICS
β±οΈ STAGE BREAKDOWN:
ββ Transcription: 3.621s
ββ Translation: 0.342s
ββ Synthesis: 7.781s
ββ TOTAL PIPELINE: 11.744s
π TIME ALLOCATION:
Transcription: 30.8%
Translation: 2.9%
Synthesis: 66.2% β Bottleneck identified!
Pronunciation Scoring:
β±οΈ PERFORMANCE METRICS
ββ Audio loading: 45.2ms
ββ MFCC extraction: 123.7ms
ββ DTW computation: 89.3ms
ββ Speech recognition: 1847.2ms β 78.9% of total time
ββ Phoneme analysis: 234.1ms
ββ TOTAL TIME: 2.341s
- Events stored in
data/learning_events.db - Aggregations: words learned/reviewed, pronunciation scores, streaks, trends
/stats: dashboard, difficult words, weekly/monthly progress
- Lazy loading: Models load on first use to minimise startup time
- Language-specific models: Pronunciation scoring automatically selects correct Wav2Vec2 model per language
- Bilingual dictionary: Queries both English and native Wiktionaries for comprehensive definitions
- Performance instrumentation: Context managers and timing decorators throughout codebase
- Caching: Scorer and model instances cached with automatic language switching
- Message history preservation: Dictionary definitions remain visible while navigating
- Universal navigation: Home button accessible from all major screens
- Docker-based CI: Exact conda environment reproduced in CI β no platform-specific dependency issues
See CHANGELOG.md for complete version history.
