Automated language course processing: Download → Transcribe → Generate Comprehensive Notes
Automate your language learning workflow for any language. Download lesson videos, extract audio, transcribe with timestamps, and generate comprehensive study notes with external resources.
- 📥 Multi-source Download: Google Drive, YouTube, direct URLs
- 🎙️ Auto-Transcription: OpenAI Whisper with timestamps
- 📝 Smart Notes: Auto-generated comprehensive study materials
- 📱 Tablet-Friendly PDFs: Beautiful, readable PDFs optimized for tablets
- 🌐 Multi-Language: Arabic, Japanese, Chinese, Spanish, French, German, Russian, and more
- 📚 Resource Database: Curated apps, YouTube channels, websites per language
- 🔄 Resumable: Progress tracking for interrupted processing
- ⚙️ Configurable: YAML configuration for any course structure
# Clone the repository
git clone https://github.com/stanasiukcom/language-learner.git
cd language-learner
# Install dependencies
pip install -r requirements.txt
# Install system dependencies (macOS)
brew install ffmpeg poppler yt-dlp pango gdk-pixbuf glib
# Or Linux (Ubuntu/Debian)
sudo apt-get install ffmpeg poppler-utils python3-cffi python3-brotli libpango-1.0-0 libgdk-pixbuf2.0-0
pip install yt-dlp# Copy example config
cp config/config.example.yaml config/config.yaml
# Edit config.yaml with your course details
nano config/config.yamlPrivacy Note: Configuration files may contain private Google Drive IDs and personal course information. The .gitignore is configured to exclude all config/*.yaml files except the example template.
Minimal configuration:
course:
name: "Spanish A1"
language: "Spanish"
level: "Beginner"
language:
code: "es"
native_language: "en"
sources:
- type: "google_drive"
lessons:
- id: "YOUR_GOOGLE_DRIVE_FILE_ID"
filename: "lesson1.mp4"
date: "2024-01-15"# Process everything automatically
python src/main.py
# Or step-by-step:
python src/main.py --download-only # Download videos
python src/main.py --transcribe-only # Transcribe
python src/main.py --notes-only # Generate notesoutput/
├── lesson1.mp4 # Downloaded video
├── audio/
│ └── lesson1.mp3 # Extracted audio
├── transcripts/
│ ├── lesson1.txt # Text transcript
│ └── lesson1.json # JSON with timestamps
├── Comprehensive_Notes_Spanish_A1.md # 📖 YOUR STUDY GUIDE (Markdown)
└── Comprehensive_Notes_Spanish_A1.pdf # 📱 TABLET-FRIENDLY VERSION
| Language | Code | Alphabet | Resources |
|---|---|---|---|
| Arabic | ar | ✅ | ✅ |
| Chinese | zh | ✅ | ✅ |
| Japanese | ja | ✅ | ✅ |
| Russian | ru | ✅ | ✅ |
| Spanish | es | ➖ | ✅ |
| French | fr | ➖ | ✅ |
| German | de | ➖ | ✅ |
| Any language | xx | Manual | Template |
Don't see your language? Contributions welcome! See CLAUDE.md for adding new languages.
- Automate processing of online course recordings
- Generate searchable, timestamped notes
- Access curated resources for your target language
- Track progress with built-in checklists
- Create study materials from lecture recordings
- Share comprehensive notes with students
- Maintain course content library
- Process YouTube courses/playlists
- Build personal study guides
- Discover quality learning resources
Google Drive:
sources:
- type: "google_drive"
lessons:
- id: "1abc123xyz" # From drive.google.com/file/d/1abc123xyz/view
filename: "lesson1.mp4"
date: "2024-01-15"YouTube:
sources:
- type: "youtube"
lessons:
- id: "dQw4w9WgXcQ" # From youtube.com/watch?v=dQw4w9WgXcQ
filename: "lesson1.mp4"Direct URL:
sources:
- type: "url"
lessons:
- url: "https://example.com/video.mp4"
filename: "lesson1.mp4"Local Files:
sources:
- type: "local"
lessons:
- filename: "lesson1.mp4" # Already in output/ directory| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
| tiny | ⚡️⚡️⚡️ | ⭐️ | Testing |
| base | ⚡️⚡️ | ⭐️⭐️ | Quick drafts |
| small | ⚡️ | ⭐️⭐️⭐️ | Good balance |
| medium | 🐌 | ⭐️⭐️⭐️⭐️ | Recommended |
| large | 🐌🐌 | ⭐️⭐️⭐️⭐️⭐️ | Best quality |
PDFs are automatically generated alongside Markdown notes. They feature:
- Tablet-optimized formatting - Perfect for iPad, Android tablets
- Beautiful typography - Readable fonts and spacing
- Syntax highlighting - For code blocks and examples
- Table support - Clean, professional tables
- RTL support - For Arabic, Hebrew, etc.
Disable PDF generation:
notes:
generate_pdf: falseEdit src/notes_generator.py to customize note structure.
- Edit
src/resources_db.py - Add your language code to
_init_resources() - Create resource list with apps, YouTube, websites
- Submit PR!
advanced:
parallel_processing: true # Process multiple videos simultaneouslyWe welcome contributions! See CLAUDE.md for:
- Adding new languages
- Improving transcription accuracy
- Enhancing note generation
- Adding integrations (Anki, Notion, etc.)
See examples/ directory for:
arabic_config.yaml- Arabic course examplejapanese_config.yaml- Japanese course exampleoutput_sample.md- Sample generated notes
pip install yt-dlp
# or
brew install yt-dlpbrew install ffmpeg # macOS
sudo apt install ffmpeg # Linux- Ensure files are publicly accessible or shared with you
- Get shareable link, extract file ID
- File ID is between
/d/and/viewin URL
- Use smaller Whisper model (
smallinstead ofmedium) - Process shorter videos
- Use GPU if available (requires CUDA setup)
- Use smaller Whisper model
- Process videos one at a time
- Close other applications
MIT License - See LICENSE file
- OpenAI Whisper - Speech recognition
- yt-dlp - Video downloader
- FFmpeg - Audio extraction
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- ⭐ Star the repo: github.com/stanasiukcom/language-learner
Made with ❤️ for language learners worldwide
⭐️ Star this repo if you find it useful!