API service to fetch YouTube video transcripts with metadata and local file caching.
- 📥 Fetch YouTube video transcripts with metadata
- 🎧
transcript_from_audiogeneration usingyt-dlp+ffmpegwith selectable backend:faster-whisper,assembly,openai, orgemini - 🧵 Background transcription jobs with progress polling
- 🪵 Timestamped development logs with visible active log level and transcription backend at startup
- 💾 Local file caching with unlimited retention by default
- 🌍 Always returns first available transcript (native/original language)
- 🐳 Docker support
- 🔌 MCP (Model Context Protocol) server integration
- 📚 Interactive Swagger documentation
- ⚡ Rate limiting
- 🔒 Optional API key authentication
- 🖼️ Basic metadata includes: title, author, duration, views, publish date, thumbnail, description
- 📊 Full metadata endpoint with all yt-dlp fields (50+ fields)
# Install dependencies
sudo apt install python3-venv ffmpeg
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
# Copy environment variables and configure
cp .env.example .env
# Edit .env if needed
# Run server in dev mode (with hot-reload, no __pycache__)
./run-api-dev.shThe development startup script:
- loads
.envfrom the project root if present - shows the active log level
- shows the active transcription backend
- enables timestamped Uvicorn logs through
app/uvicorn_log_config.json
Or manually:
PYTHONDONTWRITEBYTECODE=1 uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload --log-config app/uvicorn_log_config.jsonDocker uses a glibc-based Python image for compatibility with faster-whisper and ctranslate2, which are not reliably installable on python:3.14-alpine.
# Copy environment variables
cp .env.example .env
# Edit .env if needed
# Build and run with Docker Compose
sudo docker-compose up --build
# Or build and run manually
sudo docker build -t youtube-transcript-api .
sudo docker run -p 8000:8000 -v $(pwd)/cache:/app/cache -v $(pwd)/data:/app/data youtube-transcript-apiGET /api/v1/healthReturns application health and runtime information.
Response:
{
"status": "healthy",
"version": "1.1.0",
"uptime_seconds": 12.34,
"transcription_backend": "faster-whisper",
"transcript_from_audio_enabled": true,
"cache_path": "/app/cache",
"cache_accessible": true,
"whisper_model_loaded": false
}GET /api/v1/youtube/transcript/{video_id}?use_cache=true&force_refresh=false&language=enWhen _APP_TRANSCRIPT_FROM_AUDIO=true, this endpoint can automatically queue transcript_from_audio processing if direct YouTube transcript fetch fails for reasons such as disabled transcripts or YouTube-side access issues. Video unavailable remains a hard error.
Parameters:
video_id(path): YouTube video ID (11 chars) or full URL- Examples:
mQ-y2ZOTpr4orhttps://www.youtube.com/watch?v=mQ-y2ZOTpr4
- Examples:
use_cache(query): Enable or disable cache lookup for direct YouTube transcriptsforce_refresh(query): Skip direct transcript cache lookup and overwrite the direct cache sectionlanguage(query): Preferred transcript language code (optional)
Returns:
metadata- Basic video metadatatranscript_youtube- Direct transcript fetched from YouTubetranscript_audio- Transcript generated from the audio track if availablesource_preference- Response source ordering metadata
Response:
{
"video_id": "mQ-y2ZOTpr4",
"metadata": {
"title": "Video Title",
"author": "Channel Name",
"duration": 218,
"publish_date": "20251203",
"view_count": 9084,
"thumbnail": "https://i.ytimg.com/vi/...",
"description": "Full description..."
},
"transcript_youtube": {
"transcript": "Full transcript text here...",
"language": "en",
"source": "youtube",
"cache_used": false,
"cached_at": null
},
"transcript_audio": null,
"source_preference": ["youtube", "audio"]
}Fallback response when direct transcript fetching queues audio transcription:
{
"video_id": "3LbZP0sYmPw",
"status": "queued",
"message": "Direct YouTube transcript fetch failed: Transcripts are disabled for video 3LbZP0sYmPw. Transcript_from_audio status for video_id 3LbZP0sYmPw is 'queued'. Background transcription message: Transcript queued for background processing by video_id using backend 'assembly'. The transcript should be available in a few minutes. Check status by the same video_id.",
"progress_percent": 0,
"transcript_from_audio_reason": "Transcripts are disabled for video 3LbZP0sYmPw",
"result": null
}GET /api/v1/youtube/transcript/raw/{video_id}?use_cache=true&force_refresh=false&language=enReturns complete yt-dlp metadata together with separated transcript payloads.
Response:
{
"video_id": "mQ-y2ZOTpr4",
"metadata": {
"title": "Video Title",
"channel_id": "UC123..."
},
"transcript_youtube": {
"transcript": "Full transcript text here...",
"language": "en",
"source": "youtube",
"cache_used": true,
"cached_at": "2026-03-13T12:34:56"
},
"transcript_audio": null,
"source_preference": ["youtube", "audio"]
}POST /api/v1/youtube/audio-transcript/{video_id}Queues background transcription for the provided video_id, downloads the audio track, normalizes it with ffmpeg, transcribes it with the configured backend, and stores the result under transcript_from_audio in cache.
Supported backends:
faster-whisperassemblyopenaigemini
Response:
{
"status": "queued",
"video_id": "mQ-y2ZOTpr4",
"message": "Transcript queued for background processing by video_id using backend 'assembly'",
"progress_percent": 0,
"result": null
}GET /api/v1/youtube/audio-transcript/{video_id}Returns the current background transcription state with step-level progress.
Possible states:
queueddownloading_audioextracting_audioloading_modeluploading_audioawaiting_providertranscribingcompletedfailed
Response:
{
"video_id": "mQ-y2ZOTpr4",
"status": "transcribing",
"current_step": "transcribing",
"message": "Transcribing audio with backend 'assembly' from data/work/mQ-y2ZOTpr4/audio.wav",
"progress_percent": 70,
"created_at": "2026-03-12T05:40:00",
"updated_at": "2026-03-12T05:41:10",
"error": null,
"result": null
}GET /api/v1/cacheResponse:
{
"status": "healthy",
"cache_size": 12,
"cache_path": "./cache",
"cache_size_bytes": 482102,
"cache_size_mb": 0.46,
"max_cache_size_mb": 0
}GET /api/v1/cache/entriesRequires API key authentication when enabled.
Response:
{
"status": "healthy",
"entries": [
{
"video_id": "mQ-y2ZOTpr4",
"file_name": "mQ-y2ZOTpr4.json",
"size_bytes": 20480,
"updated_at": "2026-03-13T20:00:00"
}
],
"cache_size": 1,
"cache_size_bytes": 20480,
"cache_size_mb": 0.02,
"max_cache_size_mb": 0
}DELETE /api/v1/cacheRequires API key authentication when enabled.
DELETE /api/v1/cache/{video_id}Requires API key authentication when enabled.
GET /Returns API information and available endpoints.
The API returns the best available transcript based on YouTube availability and your optional preferred language.
- Prefers manual transcripts over auto-generated ones
- Optional
languagequery parameter can be used as a preferred transcript language hint - Response separates direct and audio transcript payloads instead of concatenating them
- Cache is stored as
video_id.jsonwith separate sections for direct and audio transcripts
- The API first checks cache by
video_id. - If not found, it fetches a transcript from YouTube.
- Direct transcripts are stored under
direct_from_youtube. - Audio transcription results are stored under
transcript_from_audio. - Cached transcript files live under
cache/{video_id}.json. - By default,
_APP_MAX_CACHE_SIZE_MB=0keeps cache size unlimited. - By default,
_APP_CACHE_TTL_DAYS=0disables cache expiration. - Automatic eviction only happens if
_APP_MAX_CACHE_SIZE_MBis set to a value greater than0. - Automatic expiration only happens if
_APP_CACHE_TTL_DAYSis set to a value greater than0.
If _APP_TRANSCRIPT_FROM_AUDIO=true and direct transcript fetching fails for an eligible reason, the standard transcript endpoint and MCP get_youtube_transcript tool automatically queue or reuse background audio transcription for the same video_id.
- The request endpoint normalizes the input to
video_id. - It checks
cache/<video_id>.jsonfortranscript_from_audio. - If not cached, it creates or reuses a file-backed background status entry in
data/jobs/. - The worker uses
data/work/<video_id>/as temporary workspace. - The worker downloads audio with
yt-dlp. ffmpegconverts audio to mono 16k WAV.- The configured backend generates the final transcript.
- The result is stored in cache and exposed through HTTP and MCP polling.
- Temporary work files are removed after processing when
_APP_JOB_CLEANUP_TEMP_FILES=true.
Interactive API documentation is available at:
MCP (Model Context Protocol) server is integrated with FastAPI and supports StreamableHttpTransport.
Set _APP_MCP_HIDE_CLEAR_CACHE=true to hide the clear_cache tool from the MCP tools list.
- MCP Endpoint:
http://localhost:8000/api/v1/mcp - Transport:
streamable_http - Tools:
get_youtube_transcriptrequest_youtube_audio_transcriptget_youtube_audio_transcriptclear_cache
from mcp.client.session import ClientSession
from mcp.client.streamable_http import streamable_http_transport
async with streamable_http_transport("http://localhost:8000/api/v1/mcp") as transport:
async with ClientSession(transport) as session:
await session.initialize()
await session.call_tool(
"get_youtube_transcript",
arguments={"video_id": "9Wg6tiaar9M"},
){
"mcpServers": {
"youtube-transcript": {
"url": "http://localhost:8000/api/v1/mcp",
"transport": "streamable_http"
}
}
}All environment variables use the _APP_ prefix. Copy .env.example to .env and adjust values for your environment.
Key groups:
- API paths and CORS
- Cache, jobs, and work directories
- Transcript and audio fallback behavior
- Provider/backend configuration
- Optional API key authentication
- Logging and port binding
_APP_API_KEY is used only for external provider authentication. _APP_X_API_KEY independently enables API access control for incoming HTTP and MCP requests. Leaving _APP_X_API_KEY empty keeps incoming API and MCP authentication disabled.
_APP_TRANSCRIPTION_BACKEND=faster-whisper
_APP_WHISPER_MODEL=large-v3
_APP_WHISPER_DEVICE=cpu
_APP_WHISPER_COMPUTE_TYPE=int8_APP_TRANSCRIPTION_BACKEND=assembly
_APP_API_KEY=your-assembly-api-key
_APP_BASE_URL=https://api.assemblyai.com
_APP_MODEL=universal-3-pro,universal-2
_APP_LANGUAGE_DETECTION=true_APP_TRANSCRIPTION_BACKEND=openai
_APP_API_KEY=your-openai-api-key
_APP_BASE_URL=https://api.openai.com/v1
_APP_MODEL=gpt-4o-mini-transcribe_APP_TRANSCRIPTION_BACKEND=gemini
_APP_API_KEY=your-gemini-api-key
_APP_BASE_URL=https://generativelanguage.googleapis.com
_APP_MODEL=gemini-2.5-flashapp/
├── main.py
├── config.py
├── models.py
├── middleware/
│ ├── auth.py
│ └── process_time.py
├── routers/
│ ├── transcript.py
│ └── transcript_from_audio.py
├── services/
│ ├── background_transcription_service.py
│ ├── cache_service.py
│ ├── job_service.py
│ ├── service_container.py
│ ├── transcription_backend_service.py
│ ├── transcript_from_audio_cache_service.py
│ └── youtube_service.py
├── utils/
│ └── transcript_utils.py
└── mcp/
└── server.py
Current version: 1.1.0
Implemented features:
- Cache service with atomic writes, optional TTL, and optional max-size eviction
- Direct transcript and audio transcript fallback flows
- Cache retention defaults to unlimited size and no expiration
- Background job status files stored under
data/jobs - Background worker temporary files stored under
data/work - Shared service container across REST and MCP
- REST cache management endpoints
- Optional CORS middleware
- Optional API key authentication
- Docker and Compose support
- Swagger and ReDoc documentation
Runtime data is split between persistent cache entries and background-processing state.
cache/
├── {video_id}.json
data/
├── jobs/
│ └── {video_id}.json
└── work/
└── {video_id}/
├── source_audio.*
└── audio.wav
cache/{video_id}.json contains:
video_iddirect_from_youtubetranscript_from_audio
data/jobs/{video_id}.json contains the current background job state, including:
statuscurrent_stepprogress_percentmessageerrorresult
data/work/{video_id}/ contains temporary downloaded and normalized audio files while a background job is running.
MIT