A personal book recommender system built from Goodreads exports. It ingests your read history and to-read shelf, enriches books with metadata from Open Library and Google Books, generates semantic embeddings via OpenAI, and trains a hybrid ranking model that learns your taste from ratings and interactive swipe feedback.
Try the full pipeline from scratch — no Goodreads export or API keys needed:
git clone <this-repo> && cd rekommend
make demoThis single command creates a Python virtual environment, installs dependencies, runs all 4 recommenders and evaluations against the included sample data (40 classic novels), and opens an HTML comparison dashboard in your browser.
- Python 3.12+
make- An OpenAI API key (only needed if you want to generate embeddings for your own books — the sample data ships with pre-generated embeddings)
The system is a pipeline of stages. Each stage produces output that feeds into the next.
Everything starts with data/books.enriched.json — a JSON array of books, each with:
- Identity: title, author
- User data: shelves (
["read"]or["to-read"]), user_rating (1–5 or null), date_read, date_added - Metadata: description, genres, publish_year, page_count, publisher, ISBNs
- Feedback: interested (true/false/null from swipe sessions), swiped_at
The repo ships with a sample dataset of 40 classic novels (25 read with ratings, 15 to-read candidates). For real use, you'd build this from a Goodreads export — see Using Your Own Data below.
Each book's description is sent to OpenAI's text-embedding-3-small model to produce a 1536-dimensional vector stored in data/embeddings.json. Books with similar themes and content end up close together in embedding space.
The sample data ships with pre-generated embeddings. To regenerate or generate for new books:
# Requires OPENAI_API_KEY in .env file
make embeddingsThe cache is additive — re-running only calls the API for books not already cached.
Four recommenders score the to-read candidates, each using a different signal:
Popularity — Ranks by global Goodreads average_rating. Knows nothing about the user. This is the naive baseline.
Author/genre similarity — Builds a user profile from books rated 4+ stars. Scores candidates by author overlap (50%), genre overlap (40%), and popularity (10%). Favors familiar authors and genres.
Embedding — Averages the embeddings of positively-rated books (weighted by rating strength) into a user taste vector. Scores candidates by cosine similarity. Can surface books by unfamiliar authors if the content is thematically similar.
Hybrid — Trains a logistic regression on 7 features extracted from all three base recommenders: embedding cosine similarity, author score, genre score, popularity, page count, publish year, and genre count. Uses StandardScaler + class_weight='balanced' to handle feature scale differences and class imbalance. Learns optimal signal weights from the user's actual reading behaviour.
Each recommender is evaluated using a strict temporal holdout: the oldest 80% of read history becomes training data, and the newest 20% becomes the test set. The model can only use past behaviour to predict future reads — no data leakage.
Metrics: Precision@K, Recall@K, NDCG@K (the primary ranking quality metric), coverage.
The CLI swipe tool (make swipe) presents unswiped to-read books sorted by the hybrid model's predicted score. You swipe right (interested) or left (not interested) with arrow keys. Each swipe is saved immediately to books.enriched.json.
Running make retrain afterwards retrains the hybrid model with swipe data as additional training examples and logs a snapshot to data/recommenders/training_log.json. Over multiple swipe-retrain cycles, the model adapts to your preferences beyond what star ratings alone capture.
Design decision: "Interested" swipes are treated as equivalent to a 4+ rating for training. This is a simplification — appeal (wanting to read) and satisfaction (enjoying it) are different signals — but it keeps the implementation simple. The training_label() function in recommenders/common.py encodes the priority: user_rating > interested swipe > unlabeled.
Set up the environment (if not already done via make demo):
make setupValidate the book dataset against the Pydantic schema:
make validateGenerate embeddings (requires OPENAI_API_KEY in .env):
make embeddingsRun all evaluations and open the comparison dashboard:
make evaluateThis runs all 8 scripts (4 evaluators + 4 recommenders), generates an HTML dashboard at data/recommenders/dashboard.html, and opens it in the browser.
Run individual recommenders or evaluators:
.venv/bin/python recommenders/popularity/evaluate.py
.venv/bin/python recommenders/popularity/recommend.py
.venv/bin/python recommenders/similarity/evaluate.py
.venv/bin/python recommenders/similarity/recommend.py
.venv/bin/python recommenders/embedding/evaluate.py
.venv/bin/python recommenders/embedding/recommend.py
.venv/bin/python recommenders/hybrid/evaluate.py
.venv/bin/python recommenders/hybrid/recommend.pyInteractive swipe session and retrain:
make swipe # Arrow keys to rate books
make retrain # Retrain hybrid model incorporating swipe feedbackTo use your own Goodreads library instead of the sample data:
-
Export your Goodreads data. Get your read history and to-read shelf into
data/books.json. The enrichment script expects a JSON array of books with at leasttitle,author,average_rating,user_rating,shelves,date_read, anddate_addedfields. -
Enrich with metadata. This fetches descriptions, genres, page counts, ISBNs, and other metadata from Open Library and Google Books:
.venv/bin/python scripts/enrich_books.py
Results are cached in
data/metadata_cache.jsonso re-runs are cheap. -
Backfill missing descriptions (optional). If some books lack descriptions after enrichment, you can backfill from the UCSD Book Graph dataset:
.venv/bin/python scripts/backfill_from_goodreads_dump.py
-
Validate the enriched data. Check that the output conforms to the schema:
make validate
-
Generate embeddings. Create a
.envfile with your OpenAI API key, then:echo "OPENAI_API_KEY=sk-..." > .env make embeddings
-
Run recommendations and evaluation.
make evaluate
-
Iterate with swipe feedback.
make swipe make retrain make evaluate # See how metrics changed
The repo ships with sample data (40 classic novels with real OpenAI embeddings) so the full pipeline runs without a personal Goodreads export.
data/books.enriched.json— Book dataset (read + to-read) with metadata. The checked-in version contains 40 sample books; a real dataset would have hundreds.data/embeddings.json— Cached OpenAI embedding vectors keyed by book_id (1536 dimensions each).data/metadata_cache.json— Metadata API cache to avoid repeated fetches (gitignored).data/to_read_raw.txt— Raw pasted Goodreads "Want to Read" page text (gitignored).data/recommenders/— Output from recommenders and evaluators: recs, metrics, dashboard, training log (gitignored).
scripts/enrich_books.py— Enriches books with Open Library / Google Books metadata.scripts/validate_enriched_books.py— Validates enriched dataset against schema and prints quality stats.scripts/generate_embeddings.py— Generates OpenAI embeddings for all books with descriptions.scripts/backfill_from_goodreads_dump.py— Backfills missing descriptions from the UCSD Book Graph dataset.scripts/evaluate_all.py— Runs all recommender evaluations and generates a self-contained HTML comparison dashboard.scripts/swipe.py— Interactive CLI swipe tool for rating to-read books.
Each recommender lives in its own subfolder with recommend.py (generates ranked recommendations) and evaluate.py (temporal holdout evaluation). Shared utilities live in recommenders/common.py.
recommenders/popularity/— Ranks by global average_rating.recommenders/similarity/— Author/genre similarity scoring.recommenders/embedding/— Cosine similarity to user taste vector.recommenders/hybrid/— Logistic regression over all signals.
schemas/enriched_books_schema.py— Pydantic schema for enriched book records.schemas/books_enriched.schema.json— Generated JSON schema artifact.
With the included 40-book sample dataset (25 read, 15 to-read):
| Metric | Popularity | Similarity | Embedding | Hybrid |
|---|---|---|---|---|
| NDCG@25 | 0.9196 | 0.9196 | 0.9196 | 0.9474 |
Precision, recall, and hits are identical across models because K=25 is much larger than the 5-book test set — every model trivially ranks all candidates. NDCG is the only metric that differentiates, since it rewards placing the relevant books higher in the ranking. The hybrid model wins.
The recommendation lists over the 15 to-read candidates differ meaningfully:
| Model | Top recommendation | Signal |
|---|---|---|
| Popularity | The Brothers Karamazov (4.33) | Highest global rating |
| Similarity | Sense and Sensibility | Author match (Austen) |
| Embedding | Crime and Punishment | Thematic similarity |
| Hybrid | Crime and Punishment | Blended: popularity + embedding + author |
With a larger personal dataset (500+ books), all metrics differentiate and the hybrid model shows clear improvements over each baseline.
- The to-read shelf is an intent/bookmark signal, not final satisfaction.
- Metrics are computed on held-out historical reads, not on future outcomes from current recommendations.
- Coverage is currently reported per run as top-K size over evaluation candidate count.
- The sample dataset is too small for meaningful precision/recall differentiation — use your own library for real evaluation.
- Explore non-linear models (gradient boosting) if more training data becomes available
- Add content-based features (description TF-IDF, series detection)
- A/B test recommendations against actual reading outcomes
- False positive/negative analysis to understand why models fail on specific books
- Per-segment performance breakdown (by genre, author, publication era)
- Ablation study to identify whether any of the 7 hybrid features are dead weight
- Per-prediction feature attribution ("recommended because: high author overlap + strong embedding match")