This is a simple recommender system for text documents built using Python. It loads a list of articles and recommends similar ones based on a user's reading history. The system uses cosine similarity to calculate article relevance and provides both highly similar and diverse recommendations to help users explore content beyond their immediate preferences.
- Random Article Selection: Initially, a random article is displayed to the user.
- Article Recommendations: After reading an article, the system generates a list of recommended articles based on similarity to the last read article.
- Similarity Calculation: Utilizes cosine similarity between document vectors to find similar articles.
- Diverse Recommendations: In addition to the most similar articles, a few less similar articles are recommended to encourage exploration.
- Customizable Parameters: The system supports configuration of the number of recommendations, minimum and maximum document frequency, and n-grams for better vectorization.
- Python
- scikit-learn: For vectorization and similarity calculation
- json, csv: For handling article data
- Clone the repository to your local machine.
- Open the project in PyCharm (Community Edition) or any Python IDE of your choice.
- Ensure you are using Python 3.7+ for the best compatibility with the dependencies.
- Install dependencies
pip install sklearn
- Load Articles: Articles are loaded from a CSV or JSON file containing fields such as
titleandtext. - Vectorization: The text of each article is vectorized using
TfidfVectorizerfromscikit-learn. This transforms the raw text into numerical vectors suitable for similarity calculations. - Cosine Similarity: The system calculates the cosine similarity between the vector of the last article the user read and all other articles in the dataset.
- Recommendations: Based on the similarity scores, a set of similar articles is recommended. To add diversity, a couple of less similar articles are included.
- User Interaction: The user can choose an article from the recommendations, after which the process repeats, showing more relevant articles based on their choice.
To run the recommender system:
- Prepare a json or CSV file with your articles. (Change filename in the project if necessary) OR use the sample data provided to test
- Execute the script OR run the recommender.py file in PyCharm (Community Edition) or any Python IDE of your choice:
python recommender.py
- Follow the prompts to interact with the recommender system.
This project is licensed under the MIT License - see the LICENSE file for details.
