🌍 Language Learner

Automated language course processing: Download → Transcribe → Generate Comprehensive Notes

Automate your language learning workflow for any language. Download lesson videos, extract audio, transcribe with timestamps, and generate comprehensive study notes with external resources.

✨ Features

📥 Multi-source Download: Google Drive, YouTube, direct URLs
🎙️ Auto-Transcription: OpenAI Whisper with timestamps
📝 Smart Notes: Auto-generated comprehensive study materials
📱 Tablet-Friendly PDFs: Beautiful, readable PDFs optimized for tablets
🌐 Multi-Language: Arabic, Japanese, Chinese, Spanish, French, German, Russian, and more
📚 Resource Database: Curated apps, YouTube channels, websites per language
🔄 Resumable: Progress tracking for interrupted processing
⚙️ Configurable: YAML configuration for any course structure

🚀 Quick Start

1. Installation

# Clone the repository
git clone https://github.com/stanasiukcom/language-learner.git
cd language-learner

# Install dependencies
pip install -r requirements.txt

# Install system dependencies (macOS)
brew install ffmpeg poppler yt-dlp pango gdk-pixbuf glib

# Or Linux (Ubuntu/Debian)
sudo apt-get install ffmpeg poppler-utils python3-cffi python3-brotli libpango-1.0-0 libgdk-pixbuf2.0-0
pip install yt-dlp

2. Configuration

⚠️ IMPORTANT: Your config files are automatically gitignored and will NOT be committed.

# Copy example config
cp config/config.example.yaml config/config.yaml

# Edit config.yaml with your course details
nano config/config.yaml

Privacy Note: Configuration files may contain private Google Drive IDs and personal course information. The .gitignore is configured to exclude all config/*.yaml files except the example template.

Minimal configuration:

course:
  name: "Spanish A1"
  language: "Spanish"
  level: "Beginner"

language:
  code: "es"
  native_language: "en"

sources:
  - type: "google_drive"
    lessons:
      - id: "YOUR_GOOGLE_DRIVE_FILE_ID"
        filename: "lesson1.mp4"
        date: "2024-01-15"

3. Run

# Process everything automatically
python src/main.py

# Or step-by-step:
python src/main.py --download-only    # Download videos
python src/main.py --transcribe-only  # Transcribe
python src/main.py --notes-only       # Generate notes

4. Results

output/
├── lesson1.mp4                           # Downloaded video
├── audio/
│   └── lesson1.mp3                       # Extracted audio
├── transcripts/
│   ├── lesson1.txt                       # Text transcript
│   └── lesson1.json                      # JSON with timestamps
├── Comprehensive_Notes_Spanish_A1.md     # 📖 YOUR STUDY GUIDE (Markdown)
└── Comprehensive_Notes_Spanish_A1.pdf    # 📱 TABLET-FRIENDLY VERSION

📚 Supported Languages

Language	Code	Alphabet	Resources
Arabic	ar	✅	✅
Chinese	zh	✅	✅
Japanese	ja	✅	✅
Russian	ru	✅	✅
Spanish	es	➖	✅
French	fr	➖	✅
German	de	➖	✅
Any language	xx	Manual	Template

Don't see your language? Contributions welcome! See CLAUDE.md for adding new languages.

🎯 Use Cases

For Students

Automate processing of online course recordings
Generate searchable, timestamped notes
Access curated resources for your target language
Track progress with built-in checklists

For Teachers

Create study materials from lecture recordings
Share comprehensive notes with students
Maintain course content library

For Self-Learners

Process YouTube courses/playlists
Build personal study guides
Discover quality learning resources

⚙️ Configuration Guide

Video Sources

Google Drive:

sources:
  - type: "google_drive"
    lessons:
      - id: "1abc123xyz"  # From drive.google.com/file/d/1abc123xyz/view
        filename: "lesson1.mp4"
        date: "2024-01-15"

YouTube:

sources:
  - type: "youtube"
    lessons:
      - id: "dQw4w9WgXcQ"  # From youtube.com/watch?v=dQw4w9WgXcQ
        filename: "lesson1.mp4"

Direct URL:

sources:
  - type: "url"
    lessons:
      - url: "https://example.com/video.mp4"
        filename: "lesson1.mp4"

Local Files:

sources:
  - type: "local"
    lessons:
      - filename: "lesson1.mp4"  # Already in output/ directory

Whisper Models

Model	Speed	Accuracy	Use Case
tiny	⚡️⚡️⚡️	⭐️	Testing
base	⚡️⚡️	⭐️⭐️	Quick drafts
small	⚡️	⭐️⭐️⭐️	Good balance
medium	🐌	⭐️⭐️⭐️⭐️	Recommended
large	🐌🐌	⭐️⭐️⭐️⭐️⭐️	Best quality

🛠️ Advanced Usage

PDF Generation

PDFs are automatically generated alongside Markdown notes. They feature:

Tablet-optimized formatting - Perfect for iPad, Android tablets
Beautiful typography - Readable fonts and spacing
Syntax highlighting - For code blocks and examples
Table support - Clean, professional tables
RTL support - For Arabic, Hebrew, etc.

Disable PDF generation:

notes:
  generate_pdf: false

Custom Note Templates

Edit src/notes_generator.py to customize note structure.

Add New Language Resources

Edit src/resources_db.py
Add your language code to _init_resources()
Create resource list with apps, YouTube, websites
Submit PR!

Parallel Processing

advanced:
  parallel_processing: true  # Process multiple videos simultaneously

⚠️ Warning: High CPU and memory usage!

🤝 Contributing

We welcome contributions! See CLAUDE.md for:

Adding new languages
Improving transcription accuracy
Enhancing note generation
Adding integrations (Anki, Notion, etc.)

📖 Examples

See examples/ directory for:

arabic_config.yaml - Arabic course example
japanese_config.yaml - Japanese course example
output_sample.md - Sample generated notes

🐛 Troubleshooting

"yt-dlp: command not found"

pip install yt-dlp
# or
brew install yt-dlp

"ffmpeg: command not found"

brew install ffmpeg       # macOS
sudo apt install ffmpeg   # Linux

Google Drive download fails

Ensure files are publicly accessible or shared with you
Get shareable link, extract file ID
File ID is between /d/ and /view in URL

Transcription is slow

Use smaller Whisper model (small instead of medium)
Process shorter videos
Use GPU if available (requires CUDA setup)

Out of memory

Use smaller Whisper model
Process videos one at a time
Close other applications

📄 License

MIT License - See LICENSE file

🙏 Acknowledgments

OpenAI Whisper - Speech recognition
yt-dlp - Video downloader
FFmpeg - Audio extraction

📞 Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
⭐ Star the repo: github.com/stanasiukcom/language-learner

Made with ❤️ for language learners worldwide

⭐️ Star this repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
docs		docs
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🌍 Language Learner

✨ Features

🚀 Quick Start

1. Installation

2. Configuration

3. Run

4. Results

📚 Supported Languages

🎯 Use Cases

For Students

For Teachers

For Self-Learners

⚙️ Configuration Guide

Video Sources

Whisper Models

🛠️ Advanced Usage

PDF Generation

Custom Note Templates

Add New Language Resources

Parallel Processing

🤝 Contributing

📖 Examples

🐛 Troubleshooting

"yt-dlp: command not found"

"ffmpeg: command not found"

Google Drive download fails

Transcription is slow

Out of memory

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages