This project implements and compares two book recommendation approaches—Content-Based Filtering and Collaborative Filtering (SVD)—using the Book-Crossing dataset.
- Goal:
Design and evaluate a recommender system pipeline using real-world book data. - Dataset:
Book-Crossing Dataset (Kaggle) - Team Members:
Shadi Farzankia 107209
Shruti Pashine 106369
Dharampal Singh 106316
-
Data Loading & Preprocessing:
- Load books, ratings, and users data.
- Clean and merge datasets, handle missing values and outliers.
-
Exploratory Data Analysis (EDA):
- Visualize distributions, check for anomalies, and understand feature relationships.
-
Recommendation Approaches:
- Content-Based Filtering: Uses book metadata (title, author, publisher) with TF-IDF and cosine similarity.
- Collaborative Filtering (SVD): Uses user-book ratings and matrix factorization (Surprise SVD).
-
Evaluation:
- Precision@5 (Hit Rate) for both methods.
- RMSE for SVD.
-
Comparison & Discussion:
- Compare strengths, weaknesses, and visualize results.
Place the following files in a data/ directory:
books.csvratings.csvusers.csv
- Clone this repository.
- Install dependencies:
pip install -r requirements.txt
- Open the notebook (
Code/RecommenderSytems.ipynb) in Jupyter or VS Code. - Run all cells in order.
- Content-Based Filtering:
- Interpretable, works for new/unpopular books, higher hit rate.
- SVD Collaborative Filtering:
- More accurate in rating prediction (lower RMSE), more personalized, but needs enough user-book interactions.
- Data sparsity and cold-start issues for collaborative filtering.
- Evaluation for SVD is limited to a sample of users for computational reasons.
- Future work: hybrid models, more features, deep learning approaches.
@Shruti Pashine, @Shadi Farzankia, @Dharampal Singh
Dataset: Book-Crossing Dataset (Kaggle)