Rekommend (Goodreads Personal Recommender)

A personal book recommender system built from Goodreads exports. It ingests your read history and to-read shelf, enriches books with metadata from Open Library and Google Books, generates semantic embeddings via OpenAI, and trains a hybrid ranking model that learns your taste from ratings and interactive swipe feedback.

Quick Start (Demo)

Try the full pipeline from scratch — no Goodreads export or API keys needed:

git clone <this-repo> && cd rekommend
make demo

This single command creates a Python virtual environment, installs dependencies, runs all 4 recommenders and evaluations against the included sample data (40 classic novels), and opens an HTML comparison dashboard in your browser.

Prerequisites

Python 3.12+
make
An OpenAI API key (only needed if you want to generate embeddings for your own books — the sample data ships with pre-generated embeddings)

How It Works

The system is a pipeline of stages. Each stage produces output that feeds into the next.

Stage 1: Book Data

Everything starts with data/books.enriched.json — a JSON array of books, each with:

Identity: title, author
User data: shelves (["read"] or ["to-read"]), user_rating (1–5 or null), date_read, date_added
Metadata: description, genres, publish_year, page_count, publisher, ISBNs
Feedback: interested (true/false/null from swipe sessions), swiped_at

The repo ships with a sample dataset of 40 classic novels (25 read with ratings, 15 to-read candidates). For real use, you'd build this from a Goodreads export — see Using Your Own Data below.

Stage 2: Embeddings

Each book's description is sent to OpenAI's text-embedding-3-small model to produce a 1536-dimensional vector stored in data/embeddings.json. Books with similar themes and content end up close together in embedding space.

The sample data ships with pre-generated embeddings. To regenerate or generate for new books:

# Requires OPENAI_API_KEY in .env file
make embeddings

The cache is additive — re-running only calls the API for books not already cached.

Stage 3: Recommenders

Four recommenders score the to-read candidates, each using a different signal:

Popularity — Ranks by global Goodreads average_rating. Knows nothing about the user. This is the naive baseline.

Author/genre similarity — Builds a user profile from books rated 4+ stars. Scores candidates by author overlap (50%), genre overlap (40%), and popularity (10%). Favors familiar authors and genres.

Embedding — Averages the embeddings of positively-rated books (weighted by rating strength) into a user taste vector. Scores candidates by cosine similarity. Can surface books by unfamiliar authors if the content is thematically similar.

Hybrid — Trains a logistic regression on 7 features extracted from all three base recommenders: embedding cosine similarity, author score, genre score, popularity, page count, publish year, and genre count. Uses StandardScaler + class_weight='balanced' to handle feature scale differences and class imbalance. Learns optimal signal weights from the user's actual reading behaviour.

Stage 4: Evaluation

Each recommender is evaluated using a strict temporal holdout: the oldest 80% of read history becomes training data, and the newest 20% becomes the test set. The model can only use past behaviour to predict future reads — no data leakage.

Metrics: Precision@K, Recall@K, NDCG@K (the primary ranking quality metric), coverage.

Stage 5: Swipe Feedback Loop

The CLI swipe tool (make swipe) presents unswiped to-read books sorted by the hybrid model's predicted score. You swipe right (interested) or left (not interested) with arrow keys. Each swipe is saved immediately to books.enriched.json.

Running make retrain afterwards retrains the hybrid model with swipe data as additional training examples and logs a snapshot to data/recommenders/training_log.json. Over multiple swipe-retrain cycles, the model adapts to your preferences beyond what star ratings alone capture.

Design decision: "Interested" swipes are treated as equivalent to a 4+ rating for training. This is a simplification — appeal (wanting to read) and satisfaction (enjoying it) are different signals — but it keeps the implementation simple. The training_label() function in recommenders/common.py encodes the priority: user_rating > interested swipe > unlabeled.

Running Individual Steps

Set up the environment (if not already done via make demo):

make setup

Validate the book dataset against the Pydantic schema:

make validate

Generate embeddings (requires OPENAI_API_KEY in .env):

make embeddings

Run all evaluations and open the comparison dashboard:

make evaluate

This runs all 8 scripts (4 evaluators + 4 recommenders), generates an HTML dashboard at data/recommenders/dashboard.html, and opens it in the browser.

Run individual recommenders or evaluators:

.venv/bin/python recommenders/popularity/evaluate.py
.venv/bin/python recommenders/popularity/recommend.py
.venv/bin/python recommenders/similarity/evaluate.py
.venv/bin/python recommenders/similarity/recommend.py
.venv/bin/python recommenders/embedding/evaluate.py
.venv/bin/python recommenders/embedding/recommend.py
.venv/bin/python recommenders/hybrid/evaluate.py
.venv/bin/python recommenders/hybrid/recommend.py

Interactive swipe session and retrain:

make swipe     # Arrow keys to rate books
make retrain   # Retrain hybrid model incorporating swipe feedback

Using Your Own Data

To use your own Goodreads library instead of the sample data:

Export your Goodreads data. Get your read history and to-read shelf into data/books.json. The enrichment script expects a JSON array of books with at least title, author, average_rating, user_rating, shelves, date_read, and date_added fields.
Enrich with metadata. This fetches descriptions, genres, page counts, ISBNs, and other metadata from Open Library and Google Books:
```
.venv/bin/python scripts/enrich_books.py
```
Results are cached in data/metadata_cache.json so re-runs are cheap.
Backfill missing descriptions (optional). If some books lack descriptions after enrichment, you can backfill from the UCSD Book Graph dataset:
```
.venv/bin/python scripts/backfill_from_goodreads_dump.py
```
Validate the enriched data. Check that the output conforms to the schema:
```
make validate
```
Generate embeddings. Create a .env file with your OpenAI API key, then:
```
echo "OPENAI_API_KEY=sk-..." > .env
make embeddings
```
Run recommendations and evaluation.
```
make evaluate
```

Iterate with swipe feedback.

make swipe
make retrain
make evaluate   # See how metrics changed

Repository Layout

Data

The repo ships with sample data (40 classic novels with real OpenAI embeddings) so the full pipeline runs without a personal Goodreads export.

data/books.enriched.json — Book dataset (read + to-read) with metadata. The checked-in version contains 40 sample books; a real dataset would have hundreds.
data/embeddings.json — Cached OpenAI embedding vectors keyed by book_id (1536 dimensions each).
data/metadata_cache.json — Metadata API cache to avoid repeated fetches (gitignored).
data/to_read_raw.txt — Raw pasted Goodreads "Want to Read" page text (gitignored).
data/recommenders/ — Output from recommenders and evaluators: recs, metrics, dashboard, training log (gitignored).

Scripts

scripts/enrich_books.py — Enriches books with Open Library / Google Books metadata.
scripts/validate_enriched_books.py — Validates enriched dataset against schema and prints quality stats.
scripts/generate_embeddings.py — Generates OpenAI embeddings for all books with descriptions.
scripts/backfill_from_goodreads_dump.py — Backfills missing descriptions from the UCSD Book Graph dataset.
scripts/evaluate_all.py — Runs all recommender evaluations and generates a self-contained HTML comparison dashboard.
scripts/swipe.py — Interactive CLI swipe tool for rating to-read books.

Recommenders

Each recommender lives in its own subfolder with recommend.py (generates ranked recommendations) and evaluate.py (temporal holdout evaluation). Shared utilities live in recommenders/common.py.

recommenders/popularity/ — Ranks by global average_rating.
recommenders/similarity/ — Author/genre similarity scoring.
recommenders/embedding/ — Cosine similarity to user taste vector.
recommenders/hybrid/ — Logistic regression over all signals.

Schemas

schemas/enriched_books_schema.py — Pydantic schema for enriched book records.
schemas/books_enriched.schema.json — Generated JSON schema artifact.

Sample Data Results

With the included 40-book sample dataset (25 read, 15 to-read):

Metric	Popularity	Similarity	Embedding	Hybrid
NDCG@25	0.9196	0.9196	0.9196	0.9474

Precision, recall, and hits are identical across models because K=25 is much larger than the 5-book test set — every model trivially ranks all candidates. NDCG is the only metric that differentiates, since it rewards placing the relevant books higher in the ranking. The hybrid model wins.

The recommendation lists over the 15 to-read candidates differ meaningfully:

Model	Top recommendation	Signal
Popularity	The Brothers Karamazov (4.33)	Highest global rating
Similarity	Sense and Sensibility	Author match (Austen)
Embedding	Crime and Punishment	Thematic similarity
Hybrid	Crime and Punishment	Blended: popularity + embedding + author

With a larger personal dataset (500+ books), all metrics differentiate and the hybrid model shows clear improvements over each baseline.

Known Limitations

The to-read shelf is an intent/bookmark signal, not final satisfaction.
Metrics are computed on held-out historical reads, not on future outcomes from current recommendations.
Coverage is currently reported per run as top-K size over evaluation candidate count.
The sample dataset is too small for meaningful precision/recall differentiation — use your own library for real evaluation.

Future Directions

Explore non-linear models (gradient boosting) if more training data becomes available
Add content-based features (description TF-IDF, series detection)
A/B test recommendations against actual reading outcomes
False positive/negative analysis to understand why models fail on specific books
Per-segment performance breakdown (by genre, author, publication era)
Ablation study to identify whether any of the 7 hybrid features are dead weight
Per-prediction feature attribution ("recommended because: high author overlap + strong embedding match")

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
docs		docs
recommenders		recommenders
schemas		schemas
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rekommend (Goodreads Personal Recommender)

Quick Start (Demo)

Prerequisites

How It Works

Stage 1: Book Data

Stage 2: Embeddings

Stage 3: Recommenders

Stage 4: Evaluation

Stage 5: Swipe Feedback Loop

Running Individual Steps

Using Your Own Data

Repository Layout

Data

Scripts

Recommenders

Schemas

Sample Data Results

Known Limitations

Future Directions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rekommend (Goodreads Personal Recommender)

Quick Start (Demo)

Prerequisites

How It Works

Stage 1: Book Data

Stage 2: Embeddings

Stage 3: Recommenders

Stage 4: Evaluation

Stage 5: Swipe Feedback Loop

Running Individual Steps

Using Your Own Data

Repository Layout

Data

Scripts

Recommenders

Schemas

Sample Data Results

Known Limitations

Future Directions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages