Skip to content

MubsPokesart/ReelRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReelRAG - Flask + React Retrieval-Augmented Reels Analyzer

ReelRAG is a full-stack system for ingesting, analyzing, and retrieving insights from a corpus of Instagram Reels. It combines a Python-based data engineering pipeline with a Flask backend and a React frontend to provide a powerful, searchable interface for video content.

Architecture Overview

The system is divided into three main layers:

  1. CLI (Data Engineering): A Typer-based command-line interface for orchestrating the data pipeline.

    • ingest: Fetches reel metadata from reels.txt and downloads the video files.
    • preprocess: Extracts audio, transcribes it using OpenAI Whisper, and cleans the text.
    • index: Generates embeddings, stores them in ChromaDB, and creates topic clusters for filtering.
  2. Flask Backend (API): A REST API that serves retrieval and analysis requests.

    • /search: Performs hybrid semantic/keyword search with Rocchio refinement.
    • /topics: Provides a list of generated topics for the UI.
    • /report: Generates LLM-powered narrative summaries based on retrieved reels.
  3. React Frontend (User Interface): A single-page application for user interaction.

    • Allows users to search the reel corpus with filters for topics and dates.
    • Displays results in a clean, card-based layout.
    • Presents detailed, citable research reports generated by the backend.

Project Structure

  • backend/: The Python Flask API that handles data processing, analysis, and retrieval.
  • frontend/: The React-based user interface for searching and viewing reel insights.
  • cli/: A Typer-based CLI for orchestrating the data ingestion and indexing pipeline.
  • data/: Contains the input reel URLs, and is the output location for downloaded videos, and processed data.

Getting Started

Prerequisites

  • Python 3.9+
  • Node.js 16+
  • An Instagram account.
  • A Google AI API Key.

1. Configuration

This project uses a .env file for environment variables. An example file is provided at .env.example. Copy this file to .env and fill in your credentials:

cp .env.example .env

Then, edit .env with your Instagram username and password, and your Google AI API Key.

2. Backend & CLI Setup

First, set up and activate a Python virtual environment.

# Navigate to the backend directory
cd backend

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install dependencies
pip install -r requirements.txt

3. Frontend Setup

In a separate terminal, navigate to the frontend directory and install the Node.js dependencies.

cd frontend
npm install

4. Data Setup

This project uses example files to show the structure of data and log files. You will need to create your own versions of these files.

  • Reels List: Copy data/reels.txt.example to data/reels.txt and add the Instagram Reel URLs you want to process.
  • Output Data: The data/output/data.json file will be generated by the processing pipeline. An example is provided.
  • Logs: The backend/logs/app.log file will be generated when the backend runs. An example is provided.

5. Running the Application

Data Pipeline (CLI):

First, populate data/reels.txt with the Instagram Reel URLs you want to process.

Then, run the data pipeline commands from the cli directory. Ensure your virtual environment is active.

# From the project root, run the CLI commands
python cli/reelrag_cli.py ingest
python cli/reelrag_cli.py preprocess
python cli/reelrag_cli.py index

Backend Server:

Once the data is indexed, start the Flask server.

# From the project root
cd backend
flask --app api.app run

Frontend Development Server:

Finally, start the React development server.

# From the frontend directory
cd frontend
npm start

Navigate to http://localhost:3000 in your browser to use the application.

About

Full-stack retrieval-augmented generation system for analyzing Instagram Reels and running LLM generated reports.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors