AI Video Search

Semantic video search with natural language queries, powered by CLIP embeddings and Gemini 2.0 Flash.

The Problem: Finding Moments in Video is Still Painfully Manual

What Teams Face Today

Media organizations, production studios, and content teams manage thousands of hours of video footage. When someone needs to find a specific moment—a product demonstration, a particular speaker, an outdoor scene—they face a critical bottleneck:

Manual review is the only option.

Editors spend 1 in 10 hours searching for content, representing massive time and cost burdens. Much footage remains locked away in archives that are difficult to access and even harder to use.

Who Experiences This

Media Production Teams: Searching for B-roll, archival footage, or specific shots across hundreds of hours
Marketing Departments: Repurposing webinar clips, event footage, or product demos for campaigns
Legal & Compliance Teams: Finding evidence, documentation, or specific statements in depositions and recordings
Training & Education: Locating relevant segments from recorded lectures, workshops, or demonstrations
Broadcasting Studios: Accessing decades of footage for news packages, documentaries, or retrospectives

Why Existing Solutions Fall Short

Keyword-Based Search: The Metadata Problem

Traditional video platforms rely on manual tagging:

Labor-intensive: Someone must watch footage and apply keywords
Incomplete coverage: Most moments never get tagged
Vocabulary mismatch: Users search for "sunset" when the tag says "golden hour"
No visual understanding: Cannot find "person smiling" or "outdoor scene" without explicit tags

Industry research confirms: "Efficient exploitation of broadcasters' archives will increasingly depend on accurate metadata" — but manual metadata creation doesn't scale.

Automatic Transcription: The Visual Blind Spot

Speech-to-text solves one problem but ignores the visual dimension:

Cannot find scenes based on what's shown, only what's said
Misses non-verbal content: actions, objects, environments, emotions
Useless for silent footage or music-heavy content
Doesn't address the fact that users need to find content based on visual context, not just dialogue

Manual Review: The Time Constraint

Watching footage is accurate but economically impossible:

1 hour of footage = 1 hour of review time (minimum)
At industry rates, searching 100 hours of content costs thousands of dollars
Editors report spending 10% of their time just looking for content
As libraries grow, the time required increases linearly — the problem never improves

The Scale Problem

All three approaches break when:

Video libraries exceed 1,000 hours
Teams need results in minutes, not days
Content is created faster than it can be tagged
Budget constraints prevent comprehensive manual review

Broadcasting and media production research shows: "The sheer volume and diversity of media assets pose challenges, with organizations accumulating vast amounts of files in different formats, resolutions, and metadata structures."

Business Impact: What Changes When Search Works

Time Saved

Organizations using semantic video search report:

25-40% shorter sales cycles when prospects engage with semantically optimized educational video content
30-35% productivity increase for employees who can find information faster
Editors reclaim 10% of their time previously spent searching for footage

Real-world impact: A production team managing 500 hours of footage reduces search time from hours to seconds. Instead of reviewing 20 clips manually, they query "person speaking to camera" and get instant results.

Decisions Accelerated

Marketing teams build campaigns in hours instead of days
Legal teams locate evidence in minutes instead of weeks
Product managers find user feedback clips for stakeholder presentations instantly
Training coordinators assemble learning modules without reviewing full recordings

New Capabilities Unlocked

Semantic search enables workflows that were previously impossible:

Cross-project discovery: "Find all outdoor scenes across our entire library"
Competitive analysis: Locate specific product features in competitor videos
Trend identification: Discover how visual themes evolve over time
Asset monetization: Make archival footage commercially viable by making it discoverable

As industry analysis confirms: "Video semantic search enables content discovery, efficient archiving and retrieval, and streamlined repurposing of video content through intelligent analysis of topics, entities, and context within the footage, at scale, which can drive cost efficiency, productivity gains, and scalability."

Who Benefits

Primary Users

1. Media & Entertainment Studios

Production teams searching for B-roll and archival footage
Post-production editors assembling cuts from large libraries
Archivists making decades of footage accessible
Broadcasters preparing news packages and documentaries

2. Enterprise Marketing & Communications

Content marketers repurposing webinar recordings
Social media teams creating clips from events
Corporate communications finding CEO statements
Product marketing locating demo footage

3. Legal & Compliance Teams

Attorneys finding evidence in depositions
Compliance officers reviewing training recordings
Risk management teams auditing recorded communications
eDiscovery professionals processing video evidence

4. Education & Training Organizations

Instructional designers curating course content
Corporate trainers finding relevant examples
Academic researchers analyzing recorded lectures
Online learning platforms enhancing content discovery

Decision Makers Who Gain Value

Operations Leaders: Reduce costs by eliminating duplicate content creation and manual review labor

Creative Directors: Unlock creative potential by making entire libraries instantly searchable

Legal Counsel: Mitigate risk by ensuring critical footage can be located when needed

CFOs: Demonstrate ROI through measurable time savings and asset utilization improvements

Product Managers: Accelerate development cycles by quickly surfacing user research and feedback

Architecture

Frontend: Next.js 14 (React 18) + TailwindCSS + SWR Backend: Python 3.11 + Flask + CLIP ViT-B-32 + Gemini 2.0 Flash Infrastructure: Google Cloud Platform (GCP) Deployment: Netlify (frontend) + Cloud Run (backend)

Live Application

Frontend: https://gcp-media-pro.netlify.app
Backend API: https://cloud-clip-api-107631450464.us-central1.run.app

GCP Products & SDK Used

Core Services

1. Cloud Run - Serverless container platform hosting the Flask API

Auto-scaling (0-10 instances)
2Gi memory, 2 CPU, 300s timeout
Region: us-central1
SDK: gcloud run deploy

2. Cloud Storage (GCS) - Object storage for videos, indexes, and clips

Bucket: gen-lang-client-0067393875-media-1770102442
Lifecycle policy: Auto-delete clips after 1 day
Public read access for thumbnails/clips
SDK: google-cloud-storage Python library

3. Cloud Build - CI/CD for Docker image builds

Builds from GitHub repository
Pushes to Container Registry
SDK: gcloud builds submit

4. Container Registry (GCR) - Docker image storage

Image: gcr.io/gen-lang-client-0067393875/cloud-clip-api

5. Vertex AI / Gemini API - AI-powered query enhancement

Model: Gemini 2.0 Flash
Expands user queries with synonyms and variations
SDK: google-generativeai Python library

Supporting Services

IAM & Service Accounts - Authentication and authorization
Cloud Logging - Application logs and request traces
Workload Identity - Secure Cloud Run to GCS authentication

GCS Bucket Structure

gs://gen-lang-client-0067393875-media-1770102442/
├── videos/              # Source videos (permanent)
├── index/               # CLIP embeddings + metadata (permanent)
│   └── {video_id}/
│       ├── shots.json
│       ├── embeddings.json
│       └── thumbs/
└── extracts/            # Generated clips (1-day TTL)

How It Works

1. Video Indexing (Offline)

Videos are batch-processed using worker/video_indexer.py:

FFmpeg detects scene changes (threshold=0.3)
Extracts thumbnail for each shot (middle frame)
Generates 512-dim CLIP embeddings for each thumbnail
Stores index in GCS: index/{video_id}/

2. Search (Runtime)

User enters natural language query:

Gemini 2.0 Flash expands query with synonyms
Query encoded with CLIP ViT-B-32
Cosine similarity search against video embeddings
Returns top K matching shots with thumbnails

3. Clip Extraction (On-Demand)

User clicks "Play" on a result:

API downloads source video from GCS
FFmpeg extracts precise clip (-ss START -t DURATION -c copy)
Uploads clip to extracts/ with 1-day lifecycle
Returns public GCS URL
Auto-cleanup prevents storage accumulation

Technology Stack

Backend (`search-api/`)

Flask 3.0               # REST API framework
Flask-CORS              # Cross-origin resource sharing
google-cloud-storage    # GCS SDK for Python
google-generativeai     # Gemini 2.0 Flash SDK
sentence-transformers   # CLIP ViT-B-32 embeddings
torch                   # PyTorch ML framework
numpy                   # Vector operations
gunicorn                # Production WSGI server
FFmpeg                  # Video processing (system package)

Frontend (`frontend/`)

Next.js 14              # React framework
React 18                # UI library
TailwindCSS 3.4         # Utility-first CSS
SWR 2.2.5               # Data fetching & caching
Inter font              # Typography (Google Fonts)

Design System

Color: Control-console aesthetic (charcoal #0D0E11, indigo accents #6366F1)
Typography: Inter font, tight spacing, 13-15px sizes
Theme: Dark mode, precise tab underlines, no decorative elements
Philosophy: Operational states, not playful navigation

API Endpoints

`GET /videos`

List all indexed videos with metadata.

Response:

[
  {
    "video_id": "videoplayback_1_d2078ef2",
    "title": "Videoplayback 1",
    "num_shots": 39,
    "duration": 235.9,
    "poster_thumbnail_url": "https://storage.googleapis.com/...",
    "indexed_at": "2026-02-03T22:59:38Z"
  }
]

`POST /query`

Search for scenes using natural language (enhanced by Gemini).

Request:

{
  "video_id": "videoplayback_1_d2078ef2",
  "query": "person speaking",
  "top_k": 3
}

Response:

[
  {
    "shot_index": 32,
    "start": 196.32,
    "end": 200.16,
    "thumbnail_url": "https://storage.googleapis.com/...",
    "score": 0.87
  }
]

`POST /extract_clip`

Extract clip on-demand from GCS video.

Request:

{
  "video_id": "videoplayback_1_d2078ef2",
  "shot_index": 32
}

Response:

{
  "clip_url": "https://storage.googleapis.com/...",
  "start": 196.32,
  "end": 200.16,
  "duration": 3.84,
  "expires_at": "2026-02-03T23:59:00Z"
}

Local Development

Prerequisites

Python 3.11+
Node.js 18+
FFmpeg installed
GCP credentials configured

Backend Setup

cd search-api
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt

export PROJECT_ID=gen-lang-client-0067393875
export BUCKET_NAME=gen-lang-client-0067393875-media-1770102442
export GEMINI_API_KEY=your-api-key  # Optional

python cloud_clip_api.py
# API runs on http://localhost:8080

Frontend Setup

cd frontend
npm install
npm run dev
# UI runs on http://localhost:3000

Index New Videos

cd worker
source venv/bin/activate
python video_indexer.py

Deployment

Backend (Cloud Run)

cd /path/to/GCP_Media_proto
./deploy_to_cloud_run.sh

Or manually:

cd search-api
gcloud builds submit --tag gcr.io/gen-lang-client-0067393875/cloud-clip-api
gcloud run deploy cloud-clip-api \
  --image gcr.io/gen-lang-client-0067393875/cloud-clip-api \
  --region us-central1 \
  --platform managed \
  --memory 2Gi \
  --cpu 2 \
  --timeout 300 \
  --allow-unauthenticated

Frontend (Netlify)

Automatic deployment on git push to main branch.

Configuration in netlify.toml:

Build command: npm run build
Publish directory: out
Base directory: frontend

Performance

Operation	Target	Actual
Query latency	<5s	<500ms
Clip extraction	<20s	10-20s
Video indexing	<5min	2-3min

Cost Estimates

Storage (100 videos)

Videos (15MB avg): ~1.5GB = $0.023/month
Index files: ~50MB = $0.001/month
Thumbnails: ~500MB = $0.008/month

Total: ~$0.03/month

Compute (1000 queries/month)

Cloud Run: $0.00 (free tier)
Netlify: $0.00 (free tier)
Clip extractions: ~$0.05

Total: ~$0.05/month

Production Scale (10K queries/month)

Storage: ~$0.50/month
Compute: ~$2-3/month
Networking: ~$0.20/month

Total: ~$3-5/month

Project Structure

GCP_Media_proto/
├── frontend/                    # Next.js application
│   ├── app/                     # Next.js App Router
│   ├── components/              # React components
│   ├── lib/                     # API client
│   └── public/                  # Static assets
│
├── search-api/                  # Flask backend
│   ├── cloud_clip_api.py        # Main API (CLIP + Gemini)
│   ├── requirements.txt         # Python dependencies
│   ├── Dockerfile               # Container definition
│   └── cloudbuild.yaml          # Cloud Build config
│
├── worker/                      # Video indexing
│   ├── video_indexer.py         # FFmpeg + CLIP indexer
│   └── venv/                    # Python virtual environment
│
├── netlify.toml                 # Netlify configuration
├── deploy_to_cloud_run.sh       # Backend deployment script
└── README.md                    # This file

Security

Implemented

Service account authentication (Cloud Run ↔ GCS)
CORS enabled for Netlify domain
Public GCS URLs (bucket is publicly readable)
Input validation on all endpoints
Auto-expiring clips (1-day lifecycle)

Production Recommendations

Add user authentication (OAuth, Auth0)
Rate limiting (10 requests/min per IP)
Cloud Armor for DDoS protection
Private GCS bucket with signed URLs
Budget alerts at $100/month
Content Security Policy headers

Adding New Videos

Upload to GCS:

gsutil cp new_video.mp4 gs://BUCKET_NAME/videos/

Index the video:

cd worker
source venv/bin/activate
python video_indexer.py

Refresh the UI to see the new video

Troubleshooting

Videos not loading

Check backend CORS configuration and Cloud Run logs:

gcloud run services logs read cloud-clip-api --region us-central1 --limit 50

Clip extraction fails

Verify FFmpeg is installed in Cloud Run container:

gcloud run services describe cloud-clip-api --region us-central1

Query returns no results

Check if video is indexed:

gsutil ls gs://BUCKET_NAME/index/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.claude		.claude
frontend		frontend
search-api		search-api
videos		videos
worker		worker
.gitignore		.gitignore
README.md		README.md
cloud-clip-ui.html		cloud-clip-ui.html
deploy_to_cloud_run.sh		deploy_to_cloud_run.sh
lifecycle-config.json		lifecycle-config.json
netlify.toml		netlify.toml
start_backend_simple.sh		start_backend_simple.sh
start_cloud_clip.sh		start_cloud_clip.sh
test_e2e.sh		test_e2e.sh

Folders and files

Latest commit

History

Repository files navigation

AI Video Search

The Problem: Finding Moments in Video is Still Painfully Manual

What Teams Face Today

Who Experiences This

Why Existing Solutions Fall Short

Keyword-Based Search: The Metadata Problem

Automatic Transcription: The Visual Blind Spot

Manual Review: The Time Constraint

The Scale Problem

Business Impact: What Changes When Search Works

Time Saved

Decisions Accelerated

New Capabilities Unlocked

Who Benefits

Primary Users

Decision Makers Who Gain Value

Architecture

Live Application

GCP Products & SDK Used

Core Services

Supporting Services

GCS Bucket Structure

How It Works

1. Video Indexing (Offline)

2. Search (Runtime)

3. Clip Extraction (On-Demand)

Technology Stack

Backend (search-api/)

Frontend (frontend/)

Design System

API Endpoints

GET /videos

POST /query

POST /extract_clip

Local Development

Prerequisites

Backend Setup

Frontend Setup

Index New Videos

Deployment

Backend (Cloud Run)

Frontend (Netlify)

Performance

Cost Estimates

Storage (100 videos)

Compute (1000 queries/month)

Production Scale (10K queries/month)

Project Structure

Security

Implemented

Production Recommendations

Adding New Videos

Troubleshooting

Videos not loading

Clip extraction fails

Query returns no results

License

Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`search-api/`)

Frontend (`frontend/`)

`GET /videos`

`POST /query`

`POST /extract_clip`

Packages