Multimodal Recommender System Enhancement for Netflix

Welcome to the Multimodal Recommender System Enhancement project for Netflix! This repository houses cutting-edge work aimed at improving how Netflix recommends content to its users by leveraging multiple data types, including text, images, and videos.

Overview

This project focuses on integrating state-of-the-art machine learning models across various domains such as Natural Language Processing (NLP), Computer Vision (CV), and Reinforcement Learning (RL) to build a highly efficient and personalized recommendation system. Our goal is to deliver an even more engaging user experience by providing relevant content recommendations tailored to individual preferences.

Data Engineering Architecture (added for internship storytelling)

Raw ingestion: synthetic Netflix-style viewing events stored at data/sample/synthetic_viewing_history.csv.
Warehouse layer: DuckDB star schema with dim_users, dim_titles, fact_views, plus feature tables for engagement and popularity.
Model training: baseline popularity and user-based collaborative filtering running against the warehouse.
Outputs & analytics: recommendations and metrics saved to outputs/, with SQL examples under sql/ and an analysis script in analysis/.
Data storytelling UI: React + Chart.js dashboard in frontend/ for showcasing the recommendations visually.

Why this demonstrates Netflix-ready data engineering

Clear ETL boundaries (extract, load, transform, feature_engineering, train_models, evaluate_models) with logging and type hints in src/netflix_recommender/data_pipeline.py.
Warehouse-style modeling in DuckDB and reusable SQL stored in sql/engagement_queries.sql for engagement analysis.
Baseline recommenders (popularity, user-based CF) that highlight experimentation velocity on top of the warehouse.
Automated quality via pytest (tests/), and GitHub Actions workflow (.github/workflows/ci.yml) to run the suite.
Front-end storytelling with a React dashboard (frontend/) to showcase recommendations to stakeholders.

Quickstart

Clone & install

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install -e .

Run tests

pytest -q

Run the end-to-end pipeline

python -m netflix_recommender.run_pipeline

Inspect outputs
- outputs/recommendations.csv with top titles per user and model.
- outputs/metrics.json with precision@k and coverage.
- analysis/recommendation_analysis.py to print a human-readable summary.

Sample output (from the synthetic dataset)

Pipeline metrics: {
  "precision_at_k": 0.5,
  "test_users": 5
}

Sample recommendations:
  user_id title_id  rank       model
0     u1      s1     1  popularity
1     u1      s3     2  popularity
2     u1      s4     3  popularity

Front-end data storytelling (React)

Navigate to frontend/, run npm install then npm run dev to launch the dashboard at http://localhost:5173.
The dashboard reads frontend/public/sample_recommendations.json by default; replace it with outputs/recommendations.csv converted to JSON for live data.
Components live in frontend/src/ and use Chart.js for quick visuals (coverage, precision badge, per-user tiles).

60-second Quickstart

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
PIP_NO_BUILD_ISOLATION=1 pip install -e . --no-build-isolation
PYTHONPATH=src python -m netflix_recommender.run_pipeline

Demo

bash scripts/demo.sh

The demo runs the pipeline in a temporary directory, enables tracing/observability/quality checks, and prints a summary with the generated outputs.

Verification

bash scripts/verify.sh

scripts/verify.sh installs dependencies (offline-safe), runs optional format/lint/type checks if available, executes the test suite, and runs the demo as a smoke test.

Features

End-to-end ETL + recommendation pipeline with output artifacts in outputs/.
Optional observability pack: structured logging, metrics registry, and OpenTelemetry scaffolding (off by default).
Trace recorder that outputs JSONL and Markdown reports for pipeline stages (demo enables this).
Plugin architecture with built-in engagement and cold-start tagging plugins (opt-in).
Safety policy and data quality checks that can be enabled via environment flags.
Pipeline reporting (summary.json and pipeline_report.md) for recruiter-friendly summaries.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
analysis		analysis
data/sample		data/sample
data_preprocessing		data_preprocessing
experiments		experiments
frontend		frontend
models		models
optimization		optimization
outputs		outputs
recommender_systems		recommender_systems
scripts		scripts
sql		sql
src		src
tests		tests
utilities		utilities
.editorconfig		.editorconfig
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
mypy.ini		mypy.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Recommender System Enhancement for Netflix

Overview

Data Engineering Architecture (added for internship storytelling)

Why this demonstrates Netflix-ready data engineering

Quickstart

Sample output (from the synthetic dataset)

Front-end data storytelling (React)

60-second Quickstart

Demo

Verification

Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Recommender System Enhancement for Netflix

Overview

Data Engineering Architecture (added for internship storytelling)

Why this demonstrates Netflix-ready data engineering

Quickstart

Sample output (from the synthetic dataset)

Front-end data storytelling (React)

60-second Quickstart

Demo

Verification

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages