Two-Stage Recommender System

Two-Tower candidate generation with LambdaMART re-ranking. Trained on MovieLens 1M. Achieves NDCG@10 of 0.143. A 3.5× improvement over the popularity baseline with sub-20ms end-to-end latency.

Architecture

graph TD
    A[User Request] --> B[FastAPI Service]
    B --> C{Redis Cache?}
    C -->|Hit| D[Return Cached]
    C -->|Miss| E[Feature Store]
    E --> F[Two-Tower Model\nUser Embedding]
    F --> G[FAISS IVF Index\nANN Search — 500 candidates]
    G --> H[Feature Enrichment\nUser + Item + Interaction]
    H --> I[LightGBM LambdaMART\nLearning-to-Rank]
    I --> J[Top-K Results]
    J --> K[Prometheus Metrics]

Results

Evaluated on MovieLens 1M test set (90/10 split by interaction time):

Method	NDCG@10	Recall@20	MRR
Popularity baseline	0.041	0.089	0.052
Two-Tower (retrieval only)	0.089	0.201	0.112
Two-Tower + LambdaMART	0.143	0.312	0.178

Latency	p50	p99
Retrieval (FAISS ANN)	6 ms	14 ms
Ranking (LightGBM)	12 ms	28 ms
Full pipeline (end-to-end)	18 ms	43 ms

Key Design Decisions

Two-Tower + FAISS: Item embeddings are precomputed and indexed once. At query time only the user tower runs; IVF quantization with n_probe=10 gives a tunable accuracy/latency tradeoff at ~6ms.

LambdaMART over pointwise loss: Directly optimizes NDCG, accounting for rank position. Pointwise regression treats ranking as MSE; it isn't.

BPR loss: Maximizes P(interacted item ranks above random non-interacted item). Correct inductive bias for implicit feedback — cross-entropy treats all unrated items as negatives.

Redis feature store: Training and serving use the same Redis keys and serialization. No divergent codepaths, no skew.

ML Engineering Features

Feature	Implementation
Two-stage retrieval-ranking	Two-Tower ANN + LightGBM LambdaMART
Approximate Nearest Neighbor	FAISS IVFFlat, inner product, 100 cells
Online feature store	Redis HSET with msgpack serialization
Learning-to-Rank	LightGBM `lambdarank`, `eval_at=[5,10,20]`, early stopping
BPR training	In-batch negatives, cosine similarity via L2-normalized inner product
Cold start	Popularity-based fallback for new users
Recommendation caching	Redis TTL cache per user
Training-serving skew detection	KL divergence per feature with threshold alerting
Observability	Prometheus histograms: latency p50/p99, cache hit rate, error rate
Graceful degradation	Popularity fallback on pipeline errors

Quickstart

make install && make train   # download data, train all models
make serve                   # API on :8000

curl -X POST http://localhost:8000/recommend \
  -H "Content-Type: application/json" \
  -d '{"user_id": 1, "k": 10}'

make docker-up    # Redis, API, Prometheus :9090, Grafana :3000
make evaluate     # NDCG, Recall, MRR on test set

Training Stages

make download-data        # MovieLens 1M
make features             # user/item feature DataFrames
make train-embeddings     # Two-Tower (~10 epochs)
make build-index          # FAISS IVF index
make train-ranker         # LightGBM LambdaMART
make load-features        # populate Redis

API Reference

`POST /recommend`

{ "user_id": 42, "k": 20, "use_cache": true }

{
  "user_id": 42,
  "recommendations": [
    {
      "item_id": 2571,
      "title": "Matrix, The (1999)",
      "score": 4.821,
      "rank": 1,
      "retrieval_score": 0.934,
      "genres": ["Action", "Sci-Fi", "Thriller"]
    }
  ],
  "latency_ms": 18.4,
  "cache_hit": false,
  "n_candidates": 500
}

`GET /model/info`

Model versions, FAISS index stats, top-10 LambdaMART feature importances by gain, pipeline latency percentiles.

`GET /metrics`

Prometheus scrape endpoint. Key metrics: recommendit_recommendation_latency_ms, recommendit_cache_hit_total, recommendit_retrieval_latency_ms.

Training-Serving Skew Detection

result = detect_training_serving_skew(train_features_df, serving_features_df, threshold=0.1)
# result['flagged_features'] — features with KL divergence above threshold
# result['skew_detected']   — True if any feature is flagged

Configuration

Variable	Default	Description
`EMBEDDING_DIM`	`64`	Two-tower embedding dimension
`TOP_K_CANDIDATES`	`500`	ANN candidates per request
`FAISS_N_LISTS`	`100`	IVF Voronoi cells
`FAISS_N_PROBE`	`10`	IVF cells searched per query
`LGBM_N_ESTIMATORS`	`500`	LightGBM boosting rounds
`CACHE_TTL_SECONDS`	`300`	Recommendation cache TTL
`REDIS_URL`	`redis://localhost:6379`	Redis connection

Project Structure

recommendit/
├── src/
│   ├── features/
│   │   ├── feature_engineering.py    # User/item/interaction features
│   │   └── feature_store.py          # Redis online feature store
│   ├── models/
│   │   ├── two_tower.py              # PyTorch Two-Tower + BPR loss
│   │   ├── faiss_index.py            # FAISS IVFFlat wrapper
│   │   └── ranker.py                 # LightGBM LambdaMART
│   ├── training/
│   │   ├── train_embeddings.py
│   │   ├── build_index.py
│   │   └── train_ranker.py
│   ├── evaluation/metrics.py         # NDCG, Recall, MRR, skew detection
│   └── serving/
│       ├── app.py                    # FastAPI
│       ├── middleware.py             # Prometheus
│       └── recommender.py           # Inference pipeline
├── data/download.py
├── tests/
├── monitoring/
├── docker-compose.yml
└── Makefile

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
monitoring		monitoring
notebooks		notebooks
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Two-Stage Recommender System

Architecture

Results

Key Design Decisions

ML Engineering Features

Quickstart

Training Stages

API Reference

`POST /recommend`

`GET /model/info`

`GET /metrics`

Training-Serving Skew Detection

Configuration

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Two-Stage Recommender System

Architecture

Results

Key Design Decisions

ML Engineering Features

Quickstart

Training Stages

API Reference

POST /recommend

GET /model/info

GET /metrics

Training-Serving Skew Detection

Configuration

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /recommend`

`GET /model/info`

`GET /metrics`

Packages