ReelRAG is a full-stack system for ingesting, analyzing, and retrieving insights from a corpus of Instagram Reels. It combines a Python-based data engineering pipeline with a Flask backend and a React frontend to provide a powerful, searchable interface for video content.
The system is divided into three main layers:
-
CLI (Data Engineering): A
Typer-based command-line interface for orchestrating the data pipeline.ingest: Fetches reel metadata fromreels.txtand downloads the video files.preprocess: Extracts audio, transcribes it using OpenAI Whisper, and cleans the text.index: Generates embeddings, stores them in ChromaDB, and creates topic clusters for filtering.
-
Flask Backend (API): A REST API that serves retrieval and analysis requests.
/search: Performs hybrid semantic/keyword search with Rocchio refinement./topics: Provides a list of generated topics for the UI./report: Generates LLM-powered narrative summaries based on retrieved reels.
-
React Frontend (User Interface): A single-page application for user interaction.
- Allows users to search the reel corpus with filters for topics and dates.
- Displays results in a clean, card-based layout.
- Presents detailed, citable research reports generated by the backend.
backend/: The Python Flask API that handles data processing, analysis, and retrieval.frontend/: The React-based user interface for searching and viewing reel insights.cli/: A Typer-based CLI for orchestrating the data ingestion and indexing pipeline.data/: Contains the input reel URLs, and is the output location for downloaded videos, and processed data.
- Python 3.9+
- Node.js 16+
- An Instagram account.
- A Google AI API Key.
This project uses a .env file for environment variables. An example file is provided at .env.example. Copy this file to .env and fill in your credentials:
cp .env.example .envThen, edit .env with your Instagram username and password, and your Google AI API Key.
First, set up and activate a Python virtual environment.
# Navigate to the backend directory
cd backend
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install dependencies
pip install -r requirements.txtIn a separate terminal, navigate to the frontend directory and install the Node.js dependencies.
cd frontend
npm installThis project uses example files to show the structure of data and log files. You will need to create your own versions of these files.
- Reels List: Copy
data/reels.txt.exampletodata/reels.txtand add the Instagram Reel URLs you want to process. - Output Data: The
data/output/data.jsonfile will be generated by the processing pipeline. An example is provided. - Logs: The
backend/logs/app.logfile will be generated when the backend runs. An example is provided.
Data Pipeline (CLI):
First, populate data/reels.txt with the Instagram Reel URLs you want to process.
Then, run the data pipeline commands from the cli directory. Ensure your virtual environment is active.
# From the project root, run the CLI commands
python cli/reelrag_cli.py ingest
python cli/reelrag_cli.py preprocess
python cli/reelrag_cli.py indexBackend Server:
Once the data is indexed, start the Flask server.
# From the project root
cd backend
flask --app api.app runFrontend Development Server:
Finally, start the React development server.
# From the frontend directory
cd frontend
npm startNavigate to http://localhost:3000 in your browser to use the application.