Welcome to the Multimodal Recommender System Enhancement project for Netflix! This repository houses cutting-edge work aimed at improving how Netflix recommends content to its users by leveraging multiple data types, including text, images, and videos.
This project focuses on integrating state-of-the-art machine learning models across various domains such as Natural Language Processing (NLP), Computer Vision (CV), and Reinforcement Learning (RL) to build a highly efficient and personalized recommendation system. Our goal is to deliver an even more engaging user experience by providing relevant content recommendations tailored to individual preferences.
- Raw ingestion: synthetic Netflix-style viewing events stored at
data/sample/synthetic_viewing_history.csv. - Warehouse layer: DuckDB star schema with
dim_users,dim_titles,fact_views, plus feature tables for engagement and popularity. - Model training: baseline popularity and user-based collaborative filtering running against the warehouse.
- Outputs & analytics: recommendations and metrics saved to
outputs/, with SQL examples undersql/and an analysis script inanalysis/. - Data storytelling UI: React + Chart.js dashboard in
frontend/for showcasing the recommendations visually.
- Clear ETL boundaries (
extract,load,transform,feature_engineering,train_models,evaluate_models) with logging and type hints insrc/netflix_recommender/data_pipeline.py. - Warehouse-style modeling in DuckDB and reusable SQL stored in
sql/engagement_queries.sqlfor engagement analysis. - Baseline recommenders (
popularity,user-based CF) that highlight experimentation velocity on top of the warehouse. - Automated quality via
pytest(tests/), and GitHub Actions workflow (.github/workflows/ci.yml) to run the suite. - Front-end storytelling with a React dashboard (
frontend/) to showcase recommendations to stakeholders.
- Clone & install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install -e .- Run tests
pytest -q- Run the end-to-end pipeline
python -m netflix_recommender.run_pipeline- Inspect outputs
outputs/recommendations.csvwith top titles per user and model.outputs/metrics.jsonwith precision@k and coverage.analysis/recommendation_analysis.pyto print a human-readable summary.
Pipeline metrics: {
"precision_at_k": 0.5,
"test_users": 5
}
Sample recommendations:
user_id title_id rank model
0 u1 s1 1 popularity
1 u1 s3 2 popularity
2 u1 s4 3 popularity
- Navigate to
frontend/, runnpm installthennpm run devto launch the dashboard at http://localhost:5173. - The dashboard reads
frontend/public/sample_recommendations.jsonby default; replace it withoutputs/recommendations.csvconverted to JSON for live data. - Components live in
frontend/src/and use Chart.js for quick visuals (coverage, precision badge, per-user tiles).
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
PIP_NO_BUILD_ISOLATION=1 pip install -e . --no-build-isolation
PYTHONPATH=src python -m netflix_recommender.run_pipelinebash scripts/demo.shThe demo runs the pipeline in a temporary directory, enables tracing/observability/quality checks, and prints a summary with the generated outputs.
bash scripts/verify.shscripts/verify.sh installs dependencies (offline-safe), runs optional format/lint/type checks if available, executes the test suite, and runs the demo as a smoke test.
- End-to-end ETL + recommendation pipeline with output artifacts in
outputs/. - Optional observability pack: structured logging, metrics registry, and OpenTelemetry scaffolding (off by default).
- Trace recorder that outputs JSONL and Markdown reports for pipeline stages (demo enables this).
- Plugin architecture with built-in engagement and cold-start tagging plugins (opt-in).
- Safety policy and data quality checks that can be enabled via environment flags.
- Pipeline reporting (
summary.jsonandpipeline_report.md) for recruiter-friendly summaries.