Skip to content

alexdimmock95/hermes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

74 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

hermes

A Telegram bot for multilingual voice and text translation, in-chat dictionary lookups with bilingual definitions, and learning tools (pronunciation practice with language-specific models, word statistics, performance metrics).

Project Overview

The bot is the main interface. It provides:

  • Translation β€” Voice note or text β†’ WhisperX transcription β†’ Google Translate β†’ XTTS v2 voice-cloned audio. Language picker, "Reply in X" for conversational flow, speed presets (0.5x / 1x / 2x). Performance metrics available in debug mode.

  • Dictionary β€” Bilingual Wiktionary lookups with definitions in both English and native language (e.g., French + English for French words), etymology, examples, and word-form buttons (complete conjugation tables for verbs with all tenses and persons, plural for nouns, comparative/superlative for adjectives). Pronunciation audio, etymology, practice pronunciation with language-specific models, and Smart Synonyms (CEFR) where supported.

  • Learning β€” Event storage, word stats, multi-language pronunciation scoring (Wav2Vec2 + DTW with language-specific models), and /stats for progress. Pronunciation scoring uses correct phoneme models per language (French words scored with French phonemes, not English). Pronunciation feedback includes pair-based articulation tips β€” e.g. if you say 'B' instead of 'V', you get specific guidance on that exact substitution.

  • Performance β€” Comprehensive latency tracking and metrics available in debug mode across all major components (transcription, translation, synthesis, pronunciation scoring). Identifies bottlenecks and timing breakdowns.

Under the hood it uses: speech_to_speech (WhisperX, Google Translate, XTTS with latency metrics), voice_transformer (speed/age/gender presets), wiktionary_client (mwparserfromhell, bilingual lookups, Telegram-safe formatting), learning (SQLite, aggregations), ml/pronunciation_score (multi-language Wav2Vec2 models, language-specific IPA extraction, pair-based phoneme correction tips).

Demo

Watch the demo

Language Support

Language support varies by feature depending on the underlying services and models used.

Feature Languages
πŸ’¬ Text translation Any language (Google Translate)
πŸŽ™ Voice-to-voice translation Spanish, French, Italian, Portuguese, German, English, Dutch, Czech, Polish, Russian, Hungarian, Arabic, Turkish, Hindi, Japanese, Korean, Mandarin (Simplified & Traditional)
🎀 Pronunciation scoring English, French, Spanish, German, Italian, Portuguese, Russian, Polish, Japanese, Mandarin, Arabic, Turkish, Dutch
πŸ“Š Smart Synonyms (CEFR) English, German, French, Spanish, Italian, Portuguese, Dutch, Russian, Mandarin, Japanese, Korean, Arabic
πŸŽ› Voice effects Any language
πŸ“– Dictionary & Etymology Best coverage for European languages (Wiktionary)

Project Structure

Root

Source (src/)

Telegram Bot (src/telegram_bot/)

  • handlers.py β€” Commands and message handlers (translate, dictionary, pronunciation practice)
  • callbacks.py β€” Button callbacks (language selection, word forms, pronunciation, etymology, etc.)
  • keyboards.py β€” Inline keyboard layouts with universal Home button
  • config.py β€” Languages and bot configuration
  • utils.py β€” Utility functions (speed adjustment, etc.)

Dictionary (src/dictionary/)

  • wiktionary_client.py β€” Bilingual definitions (English + native), etymology, examples, word-forms keyboard, improved template parsing
  • corpus_examples.py β€” Sentence examples from corpora
  • cefr.py β€” CEFR difficulty classification / Smart Synonyms
  • word_forms_extractor.py β€” Complete conjugation tables (all tenses, all persons), plural forms, comparative/superlative forms

Learning (src/learning/)

ML (src/ml/)

  • pronunciation_score.py β€” Multi-language Wav2Vec2-based pronunciation scoring with language-specific models, IPA extraction, pair-based phoneme correction tips, comprehensive performance metrics

Tests (tests/)

22-test pytest suite covering: Levenshtein distance, phoneme similarity, feedback generation, Telegram handlers (start, set language), database initialisation, and module imports.

Demo

  • Pipeline demo β€” Optional: python legacy/demo/demo.py (file or mic β†’ processing β†’ playback), if that script is present.

Running the Bot

  1. Add TELEGRAM_BOT_TOKEN=... to a .env file at the project root (or set the env var).
  2. Start the bot:
python src/telegram_bot.py

or

python -m src.telegram_bot
  1. Enable Debug Mode (optional) for detailed performance metrics:
translator = SpeechToSpeechTranslator(debug=True)
scorer = PronunciationScore(language="fr", debug=True)

Notes: WhisperX and XTTS are lazy-loaded (first use may be slower). Language-specific Wav2Vec2 models download on first use per language. You need network access for Wiktionary and translation APIs.

Development & CI/CD

Continuous Integration

The project uses GitHub Actions for automated testing on every push and pull request. Tests run inside a Docker container that mirrors the local conda environment, avoiding platform-specific dependency issues.

Status: Tests

The CI pipeline:

  1. Restores cached Docker layers (keyed on environment.yml β€” only rebuilds when dependencies change)
  2. Builds Docker image with full conda environment if cache miss
  3. Runs 22-test pytest suite
  4. Reports pass/fail status per commit

First run after a dependency change takes ~10 minutes to rebuild. All other pushes complete in ~30 seconds thanks to layer caching.

View test results: Go to the "Actions" tab on GitHub after pushing code.

Running Tests Locally

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=term-missing

# Run specific test file
pytest tests/test_bot.py -v

Getting Started (development)

  1. Install dependencies
conda env create -f environment.yml
conda activate accent-soft
  1. Install espeak-ng (required for IPA phoneme extraction):
# macOS
brew install espeak-ng

# Linux
sudo apt-get install espeak-ng
  1. Run tests
pytest tests/
  1. Run the demo (optional, if present)
python legacy/demo/demo.py
  1. Push code β€” Tests run automatically via GitHub Actions
git add .
git commit -m "Your changes"
git push origin main

Check the "Actions" tab on GitHub to see test results.

Dependencies

Key packages: python-telegram-bot, python-dotenv, whisperx, TTS (XTTS), deep_translator, mwparserfromhell, gtts, soundfile, librosa, torch, transformers (Wav2Vec2), fastdtw, inflect. See environment.yml.

Note: whisperx is excluded from the CI Docker environment due to irresolvable numpy version conflicts with other ML packages. It is used and tested locally.

Features

Multi-language Pronunciation Scoring

The pronunciation scorer uses language-specific models and pair-based correction tips:

  • Language-specific Wav2Vec2 models: French words scored with French phoneme recognition, Spanish with Spanish, etc.
  • 13+ languages supported: English, French, Spanish, German, Italian, Portuguese, Russian, Polish, Japanese, Chinese, Arabic, Turkish, Dutch
  • Pair-based articulation tips: feedback is specific to what was heard vs. what was expected β€” e.g. "You said 'B' but the target is V β€” your upper teeth should lightly touch your lower lip"
  • Language-specific IPA extraction: Uses espeak-ng with correct language voices

Example:

from src.ml.pronunciation_score import score_user_pronunciation

result = score_user_pronunciation(
    user_audio_bytes,
    "bonjour",
    language="fr",
    debug=True
)
print(result['overall_score'], result['feedback'])

Bilingual Dictionary

Dictionary lookups show definitions in both English and the native language:

  • English Wiktionary definitions (reliable, comprehensive)
  • Native language definitions (e.g., French definitions from fr.wiktionary.org)
  • Flag emojis for visual distinction (πŸ‡¬πŸ‡§ English, πŸ‡«πŸ‡· French, etc.)
  • Automatic fallback if native definitions unavailable

Complete Verb Conjugations

Word form buttons display complete conjugation tables like Google Translate:

  • All persons: je/tu/il/nous/vous/ils (French), yo/tΓΊ/Γ©l/nosotros/vosotros/ellos (Spanish), etc.
  • All major tenses: Present, Future, Imperfect, PassΓ© Simple, Conditional, Subjunctive
  • Supports French, Spanish, Italian, Portuguese, Romanian

Performance Metrics

Both translation and pronunciation scoring include comprehensive latency tracking in debug mode:

Speech-to-Speech Translation:

πŸ“Š COMPLETE PIPELINE METRICS
⏱️  STAGE BREAKDOWN:
β”œβ”€ Transcription: 3.621s
β”œβ”€ Translation: 0.342s
β”œβ”€ Synthesis: 7.781s
└─ TOTAL PIPELINE: 11.744s

πŸ“Š TIME ALLOCATION:
   Transcription: 30.8%
   Translation: 2.9%
   Synthesis: 66.2%  ← Bottleneck identified!

Pronunciation Scoring:

⏱️  PERFORMANCE METRICS
β”œβ”€ Audio loading: 45.2ms
β”œβ”€ MFCC extraction: 123.7ms
β”œβ”€ DTW computation: 89.3ms
β”œβ”€ Speech recognition: 1847.2ms  ← 78.9% of total time
β”œβ”€ Phoneme analysis: 234.1ms
└─ TOTAL TIME: 2.341s

Learning Analytics

  • Events stored in data/learning_events.db
  • Aggregations: words learned/reviewed, pronunciation scores, streaks, trends
  • /stats: dashboard, difficult words, weekly/monthly progress

Architecture Highlights

  • Lazy loading: Models load on first use to minimise startup time
  • Language-specific models: Pronunciation scoring automatically selects correct Wav2Vec2 model per language
  • Bilingual dictionary: Queries both English and native Wiktionaries for comprehensive definitions
  • Performance instrumentation: Context managers and timing decorators throughout codebase
  • Caching: Scorer and model instances cached with automatic language switching
  • Message history preservation: Dictionary definitions remain visible while navigating
  • Universal navigation: Home button accessible from all major screens
  • Docker-based CI: Exact conda environment reproduced in CI β€” no platform-specific dependency issues

See CHANGELOG.md for complete version history.

About

Multi-language translation, dictionary and learning bot for Telegram.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors