YouTube Transcript API Service

API service to fetch YouTube video transcripts with metadata and local file caching.

Features

📥 Fetch YouTube video transcripts with metadata
🎧 transcript_from_audio generation using yt-dlp + ffmpeg with selectable backend: faster-whisper, assembly, openai, or gemini
🧵 Background transcription jobs with progress polling
🪵 Timestamped development logs with visible active log level and transcription backend at startup
💾 Local file caching with unlimited retention by default
🌍 Always returns first available transcript (native/original language)
🐳 Docker support
🔌 MCP (Model Context Protocol) server integration
📚 Interactive Swagger documentation
⚡ Rate limiting
🔒 Optional API key authentication
🖼️ Basic metadata includes: title, author, duration, views, publish date, thumbnail, description
📊 Full metadata endpoint with all yt-dlp fields (50+ fields)

Quick Start

Local Development

# Install dependencies
sudo apt install python3-venv ffmpeg
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

# Copy environment variables and configure
cp .env.example .env
# Edit .env if needed

# Run server in dev mode (with hot-reload, no __pycache__)
./run-api-dev.sh

The development startup script:

loads .env from the project root if present
shows the active log level
shows the active transcription backend
enables timestamped Uvicorn logs through app/uvicorn_log_config.json

Or manually:

PYTHONDONTWRITEBYTECODE=1 uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload --log-config app/uvicorn_log_config.json

Docker

Docker uses a glibc-based Python image for compatibility with faster-whisper and ctranslate2, which are not reliably installable on python:3.14-alpine.

# Copy environment variables
cp .env.example .env
# Edit .env if needed

# Build and run with Docker Compose
sudo docker-compose up --build

# Or build and run manually
sudo docker build -t youtube-transcript-api .
sudo docker run -p 8000:8000 -v $(pwd)/cache:/app/cache -v $(pwd)/data:/app/data youtube-transcript-api

API Endpoints

1. Health Check

GET /api/v1/health

Returns application health and runtime information.

Response:

{
  "status": "healthy",
  "version": "1.1.0",
  "uptime_seconds": 12.34,
  "transcription_backend": "faster-whisper",
  "transcript_from_audio_enabled": true,
  "cache_path": "/app/cache",
  "cache_accessible": true,
  "whisper_model_loaded": false
}

2. Get Transcript with Basic Metadata

GET /api/v1/youtube/transcript/{video_id}?use_cache=true&force_refresh=false&language=en

When _APP_TRANSCRIPT_FROM_AUDIO=true, this endpoint can automatically queue transcript_from_audio processing if direct YouTube transcript fetch fails for reasons such as disabled transcripts or YouTube-side access issues. Video unavailable remains a hard error.

Parameters:

video_id (path): YouTube video ID (11 chars) or full URL
- Examples: mQ-y2ZOTpr4 or https://www.youtube.com/watch?v=mQ-y2ZOTpr4
use_cache (query): Enable or disable cache lookup for direct YouTube transcripts
force_refresh (query): Skip direct transcript cache lookup and overwrite the direct cache section
language (query): Preferred transcript language code (optional)

Returns:

metadata - Basic video metadata
transcript_youtube - Direct transcript fetched from YouTube
transcript_audio - Transcript generated from the audio track if available
source_preference - Response source ordering metadata

Response:

{
  "video_id": "mQ-y2ZOTpr4",
  "metadata": {
    "title": "Video Title",
    "author": "Channel Name",
    "duration": 218,
    "publish_date": "20251203",
    "view_count": 9084,
    "thumbnail": "https://i.ytimg.com/vi/...",
    "description": "Full description..."
  },
  "transcript_youtube": {
    "transcript": "Full transcript text here...",
    "language": "en",
    "source": "youtube",
    "cache_used": false,
    "cached_at": null
  },
  "transcript_audio": null,
  "source_preference": ["youtube", "audio"]
}

Fallback response when direct transcript fetching queues audio transcription:

{
  "video_id": "3LbZP0sYmPw",
  "status": "queued",
  "message": "Direct YouTube transcript fetch failed: Transcripts are disabled for video 3LbZP0sYmPw. Transcript_from_audio status for video_id 3LbZP0sYmPw is 'queued'. Background transcription message: Transcript queued for background processing by video_id using backend 'assembly'. The transcript should be available in a few minutes. Check status by the same video_id.",
  "progress_percent": 0,
  "transcript_from_audio_reason": "Transcripts are disabled for video 3LbZP0sYmPw",
  "result": null
}

3. Get Transcript with Full Metadata

GET /api/v1/youtube/transcript/raw/{video_id}?use_cache=true&force_refresh=false&language=en

Returns complete yt-dlp metadata together with separated transcript payloads.

Response:

{
  "video_id": "mQ-y2ZOTpr4",
  "metadata": {
    "title": "Video Title",
    "channel_id": "UC123..."
  },
  "transcript_youtube": {
    "transcript": "Full transcript text here...",
    "language": "en",
    "source": "youtube",
    "cache_used": true,
    "cached_at": "2026-03-13T12:34:56"
  },
  "transcript_audio": null,
  "source_preference": ["youtube", "audio"]
}

4. Queue Audio Transcription

POST /api/v1/youtube/audio-transcript/{video_id}

Queues background transcription for the provided video_id, downloads the audio track, normalizes it with ffmpeg, transcribes it with the configured backend, and stores the result under transcript_from_audio in cache.

Supported backends:

faster-whisper
assembly
openai
gemini

Response:

{
  "status": "queued",
  "video_id": "mQ-y2ZOTpr4",
  "message": "Transcript queued for background processing by video_id using backend 'assembly'",
  "progress_percent": 0,
  "result": null
}

5. Check Background Transcription Status

GET /api/v1/youtube/audio-transcript/{video_id}

Returns the current background transcription state with step-level progress.

Possible states:

queued
downloading_audio
extracting_audio
loading_model
uploading_audio
awaiting_provider
transcribing
completed
failed

Response:

{
  "video_id": "mQ-y2ZOTpr4",
  "status": "transcribing",
  "current_step": "transcribing",
  "message": "Transcribing audio with backend 'assembly' from data/work/mQ-y2ZOTpr4/audio.wav",
  "progress_percent": 70,
  "created_at": "2026-03-12T05:40:00",
  "updated_at": "2026-03-12T05:41:10",
  "error": null,
  "result": null
}

6. Cache Status

GET /api/v1/cache

Response:

{
  "status": "healthy",
  "cache_size": 12,
  "cache_path": "./cache",
  "cache_size_bytes": 482102,
  "cache_size_mb": 0.46,
  "max_cache_size_mb": 0
}

7. List Cache Entries

GET /api/v1/cache/entries

Requires API key authentication when enabled.

Response:

{
  "status": "healthy",
  "entries": [
    {
      "video_id": "mQ-y2ZOTpr4",
      "file_name": "mQ-y2ZOTpr4.json",
      "size_bytes": 20480,
      "updated_at": "2026-03-13T20:00:00"
    }
  ],
  "cache_size": 1,
  "cache_size_bytes": 20480,
  "cache_size_mb": 0.02,
  "max_cache_size_mb": 0
}

8. Clear All Cache Entries

DELETE /api/v1/cache

Requires API key authentication when enabled.

9. Clear Cache Entry By Video ID

DELETE /api/v1/cache/{video_id}

Requires API key authentication when enabled.

10. Root Endpoint

GET /

Returns API information and available endpoints.

Behavior

Language Handling

The API returns the best available transcript based on YouTube availability and your optional preferred language.

Prefers manual transcripts over auto-generated ones
Optional language query parameter can be used as a preferred transcript language hint
Response separates direct and audio transcript payloads instead of concatenating them
Cache is stored as video_id.json with separate sections for direct and audio transcripts

Cache Logic

The API first checks cache by video_id.
If not found, it fetches a transcript from YouTube.
Direct transcripts are stored under direct_from_youtube.
Audio transcription results are stored under transcript_from_audio.
Cached transcript files live under cache/{video_id}.json.
By default, _APP_MAX_CACHE_SIZE_MB=0 keeps cache size unlimited.
By default, _APP_CACHE_TTL_DAYS=0 disables cache expiration.
Automatic eviction only happens if _APP_MAX_CACHE_SIZE_MB is set to a value greater than 0.
Automatic expiration only happens if _APP_CACHE_TTL_DAYS is set to a value greater than 0.

If _APP_TRANSCRIPT_FROM_AUDIO=true and direct transcript fetching fails for an eligible reason, the standard transcript endpoint and MCP get_youtube_transcript tool automatically queue or reuse background audio transcription for the same video_id.

Transcript From Audio Logic

The request endpoint normalizes the input to video_id.
It checks cache/<video_id>.json for transcript_from_audio.
If not cached, it creates or reuses a file-backed background status entry in data/jobs/.
The worker uses data/work/<video_id>/ as temporary workspace.
The worker downloads audio with yt-dlp.
ffmpeg converts audio to mono 16k WAV.
The configured backend generates the final transcript.
The result is stored in cache and exposed through HTTP and MCP polling.
Temporary work files are removed after processing when _APP_JOB_CLEANUP_TEMP_FILES=true.

Documentation

Interactive API documentation is available at:

MCP Server

MCP (Model Context Protocol) server is integrated with FastAPI and supports StreamableHttpTransport.

Set _APP_MCP_HIDE_CLEAR_CACHE=true to hide the clear_cache tool from the MCP tools list.

MCP Endpoint: http://localhost:8000/api/v1/mcp
Transport: streamable_http
Tools:
- get_youtube_transcript
- request_youtube_audio_transcript
- get_youtube_audio_transcript
- clear_cache

MCP Tool Example

from mcp.client.session import ClientSession
from mcp.client.streamable_http import streamable_http_transport

async with streamable_http_transport("http://localhost:8000/api/v1/mcp") as transport:
    async with ClientSession(transport) as session:
        await session.initialize()
        await session.call_tool(
            "get_youtube_transcript",
            arguments={"video_id": "9Wg6tiaar9M"},
        )

MCP Config for IDEs

{
  "mcpServers": {
    "youtube-transcript": {
      "url": "http://localhost:8000/api/v1/mcp",
      "transport": "streamable_http"
    }
  }
}

Configuration

All environment variables use the _APP_ prefix. Copy .env.example to .env and adjust values for your environment.

Key groups:

API paths and CORS
Cache, jobs, and work directories
Transcript and audio fallback behavior
Provider/backend configuration
Optional API key authentication
Logging and port binding

_APP_API_KEY is used only for external provider authentication. _APP_X_API_KEY independently enables API access control for incoming HTTP and MCP requests. Leaving _APP_X_API_KEY empty keeps incoming API and MCP authentication disabled.

Backend Examples

Faster Whisper

_APP_TRANSCRIPTION_BACKEND=faster-whisper
_APP_WHISPER_MODEL=large-v3
_APP_WHISPER_DEVICE=cpu
_APP_WHISPER_COMPUTE_TYPE=int8

AssemblyAI

_APP_TRANSCRIPTION_BACKEND=assembly
_APP_API_KEY=your-assembly-api-key
_APP_BASE_URL=https://api.assemblyai.com
_APP_MODEL=universal-3-pro,universal-2
_APP_LANGUAGE_DETECTION=true

OpenAI

_APP_TRANSCRIPTION_BACKEND=openai
_APP_API_KEY=your-openai-api-key
_APP_BASE_URL=https://api.openai.com/v1
_APP_MODEL=gpt-4o-mini-transcribe

Gemini

_APP_TRANSCRIPTION_BACKEND=gemini
_APP_API_KEY=your-gemini-api-key
_APP_BASE_URL=https://generativelanguage.googleapis.com
_APP_MODEL=gemini-2.5-flash

Architecture

app/
├── main.py
├── config.py
├── models.py
├── middleware/
│   ├── auth.py
│   └── process_time.py
├── routers/
│   ├── transcript.py
│   └── transcript_from_audio.py
├── services/
│   ├── background_transcription_service.py
│   ├── cache_service.py
│   ├── job_service.py
│   ├── service_container.py
│   ├── transcription_backend_service.py
│   ├── transcript_from_audio_cache_service.py
│   └── youtube_service.py
├── utils/
│   └── transcript_utils.py
└── mcp/
    └── server.py

Development

Project Status

Current version: 1.1.0

Implemented features:

Cache service with atomic writes, optional TTL, and optional max-size eviction
Direct transcript and audio transcript fallback flows
Cache retention defaults to unlimited size and no expiration
Background job status files stored under data/jobs
Background worker temporary files stored under data/work
Shared service container across REST and MCP
REST cache management endpoints
Optional CORS middleware
Optional API key authentication
Docker and Compose support
Swagger and ReDoc documentation

Cache Structure

Runtime data is split between persistent cache entries and background-processing state.

cache/
├── {video_id}.json

data/
├── jobs/
│   └── {video_id}.json
└── work/
    └── {video_id}/
        ├── source_audio.*
        └── audio.wav

cache/{video_id}.json contains:

video_id
direct_from_youtube
transcript_from_audio

data/jobs/{video_id}.json contains the current background job state, including:

status
current_step
progress_percent
message
error
result

data/work/{video_id}/ contains temporary downloaded and normalized audio files while a background job is running.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
app		app
.dockerignore		.dockerignore
.env.example		.env.example
.env222		.env222
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
clean_pycache.sh		clean_pycache.sh
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run-api-dev.sh		run-api-dev.sh
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

YouTube Transcript API Service

Features

Quick Start

Local Development

Docker

API Endpoints

1. Health Check

2. Get Transcript with Basic Metadata

3. Get Transcript with Full Metadata

4. Queue Audio Transcription

5. Check Background Transcription Status

6. Cache Status

7. List Cache Entries

8. Clear All Cache Entries

9. Clear Cache Entry By Video ID

10. Root Endpoint

Behavior

Language Handling

Cache Logic

Transcript From Audio Logic

Documentation

MCP Server

MCP Tool Example

MCP Config for IDEs

Configuration

Backend Examples

Faster Whisper

AssemblyAI

OpenAI

Gemini

Architecture

Development

Project Status

Cache Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages