YouTube RAG Knowledge Base

Automated pipeline that ingests an entire YouTube channel's video library, transcribes content via Gemini, generates embeddings, stores in a vector database, and exposes a conversational AI agent for natural-language Q&A over the knowledge base.

What It Does

This system turns any YouTube channel into a searchable, conversational knowledge base. Instead of scrubbing through hours of video to find specific information, you ask a question and get an AI-generated answer grounded in the actual video content — with retrieval-augmented generation (RAG) ensuring accuracy.

Two-phase system:

Phase 1 — Ingestion Pipeline:

Fetches all uploaded videos from a target YouTube channel via the YouTube Data API
Attempts to download existing captions (SBV format) for each video
For videos without captions, transcribes the full audio using Gemini 2.5 Flash's multi-modal capabilities (feeds the YouTube URL directly)
Cleans transcription text (removes timecodes, normalizes line breaks)
Generates OpenAI embeddings for the transcribed content
Stores embeddings in a Supabase vector store (pgvector) for semantic search

Phase 2 — Conversational Agent:

Accepts natural-language questions via a chat interface
Queries the Supabase vector store to retrieve semantically relevant transcript segments
Augments the AI agent's context with retrieved content (RAG)
Generates accurate, source-grounded answers using o4-mini
Falls back to Perplexity web search for questions outside the knowledge base
Maintains conversation history via Postgres chat memory

Architecture

graph TB
    subgraph "Phase 1: Ingestion"
        A[YouTube Channel ID] --> B[Fetch All Uploaded Videos]
        B --> C[For Each Video]
        C --> D{Captions Available?}
        D -->|Yes| E[Download Captions SBV]
        D -->|No| F[Gemini 2.5 Flash Transcribe]
        E --> G[Clean Text: Remove Timecodes]
        G --> H[Normalize Line Breaks]
        F --> I[Extract Transcript Text]
        I --> J[OpenAI Embeddings]
        H --> J
        J --> K[Store in Supabase pgvector]
    end

    subgraph "Phase 2: Conversational Agent"
        L[User Question] --> M[AI Agent - o4-mini]
        M --> N[Supabase Vector Store Retrieval]
        M --> O[Perplexity Web Search]
        M --> P[Postgres Chat Memory]
        N -->|Relevant Segments| M
        O -->|Web Results| M
        P -->|Conversation History| M
        M --> Q[Grounded Answer]
    end

    K -.->|Semantic Search| N

Technical Components

Component	Technology	Purpose
Video Discovery	YouTube Data API v3	Enumerate all channel uploads
Caption Download	YouTube Captions API	Retrieve existing SBV captions
Audio Transcription	Gemini 2.5 Flash (multi-modal)	Transcribe videos without captions
Text Processing	Regex-based cleaning pipeline	Remove timecodes, normalize formatting
Embeddings	OpenAI Embeddings API	Convert text chunks to vector representations
Vector Store	Supabase with pgvector extension	Semantic similarity search over transcripts
Chat Agent	o4-mini via n8n AI Agent	Conversational Q&A with tool calling
Web Augmentation	Perplexity (sonar-pro)	Answer questions outside the knowledge base
Chat Memory	Postgres (Supabase)	Persistent conversation history across sessions
Rate Limiting	n8n Wait nodes + batch processing	Respect API rate limits during bulk ingestion

Design Decisions

Why Gemini 2.5 Flash for transcription? Gemini's multi-modal capability can process YouTube video URLs directly — no need to download audio, convert formats, or manage file storage. It handles the full pipeline from video URL to transcript text in a single API call. Using the Flash tier keeps costs low for bulk channel ingestion.

Why a dual caption strategy (download first, then transcribe)? YouTube's own captions, when available, are often higher quality than AI transcription — especially for channels that upload manually-corrected captions. The pipeline tries the fast/free path first (downloading existing captions) and only falls back to Gemini transcription when necessary.

Why Supabase/pgvector instead of a dedicated vector DB? Supabase gives me a unified backend — the same Postgres instance handles the vector store, chat memory, and any future metadata. No additional infrastructure to manage. pgvector's performance is more than sufficient for this scale.

Why o4-mini for the chat agent? The retrieval step provides the factual grounding, so the chat model's job is primarily synthesis and presentation — o4-mini handles this well at lower cost than larger models. The chain-of-thought capability also helps with multi-step reasoning over retrieved segments.

Why include Perplexity as a fallback tool? Not every question will be answerable from the video content alone. Perplexity gives the agent access to current web information, so it can supplement the knowledge base with real-time data when needed — while clearly distinguishing between "from the videos" and "from the web."

Key Technical Features

Multi-modal ingestion: Combines caption download, AI transcription, and text processing in a unified pipeline
Hybrid retrieval: Vector similarity search (Supabase/pgvector) + live web search (Perplexity) as agent tools
Persistent memory: Postgres-backed chat history enables multi-turn conversations with context
Batch processing with rate limiting: Wait nodes prevent API throttling during bulk channel ingestion
Fully self-hosted: Entire stack runs on personal infrastructure — n8n, Supabase, all under my control

Infrastructure

Hosting: Self-hosted on personal infrastructure (Coolify PaaS)
Database: Supabase (self-hosted Postgres + pgvector)
Orchestration: n8n workflow engine (27 nodes)
Models: Gemini 2.5 Flash (transcription), OpenAI (embeddings), o4-mini (chat agent), Perplexity sonar-pro (web search)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube RAG Knowledge Base

What It Does

Architecture

Technical Components

Design Decisions

Key Technical Features

Infrastructure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

YouTube RAG Knowledge Base

What It Does

Architecture

Technical Components

Design Decisions

Key Technical Features

Infrastructure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages