Welcome to Feelms, a movie recommendation app that suggests films based on your emotions! This app leverages collaborative filtering and machine learning models to predict your favorite movies and ratings, all while interacting with a MySQL database hosted on AWS.
- Emotion-based Recommendations: Choose your current mood, and Feelms will recommend movies that match your emotions (e.g., Happy, Excited, Relaxed).
- User Authentication: Securely create or log into your account.
- Movie Ratings & Favorites: Rate your favorite movies and save them to a personalized favorites list.
- Machine Learning Models:
- SVD (Collaborative Filtering): Provides personalized movie ratings based on user interaction.
- Random Forest Classifier: Predicts whether a movie will be favorited by a user, based on features like duration and ratings.
- Streamlit UI: Simple, intuitive web interface powered by Streamlit for seamless interaction with the recommendation system.
-
Movie Dataset (
imdb.csv): The original dataset from Kaggle contains basic information such as title, director, cast, genres, and poster images. This data was processed in the notebookimdb.ipynbto produce two key outputs:imdb_clean.csv: A refined version of the movie dataset with enhanced, ready-for-model features, including an additional emotion-based column.- Emotion Mapping: Using a custom dictionary, genres were mapped to specific emotions (e.g., Comedy to Happy, Horror to Scared), creating an emotion field for each movie, which enables tailored recommendations based on user-selected emotions.
-
Generated Datasets for Interactions: To simulate realistic interaction data for the machine learning models, a data generation script produces the following datasets:
- Users (
users.csv): A simulated user base with sequential IDs, usernames, and passwords, representing 1,000 users split into active (20%) and less active (80%) groups. - Interactions (
interactions.csv): Simulated interactions between users and movies with emotions selected based on weighted probabilities, representing different engagement types (view or shown). - Favorites (
favorites.csv): Generated from a subset (30%) of viewed movies that were marked as favorites, with favorite dates occurring after the original view date. - Ratings (
ratings.csv): Assigned ratings to 50% of favorites, with ratings influenced by the associated emotion, reflecting a realistic distribution of user preferences.
- Users (
- This model is used for collaborative filtering to predict user ratings on movies. Key hyperparameters (e.g.,
n_factors = 100,lr_all = 0.005) were adjusted for optimal recommendation accuracy. - Cross-validation with metrics like RMSE and MAE was applied to evaluate and fine-tune the model's performance on interaction data.
- The Random Forest model predicts whether a movie will become a favorite based on features such as duration and rating. The model was optimized for accuracy, precision, and recall using cross-validation and hyperparameter tuning (e.g.,
max_depth = 15). - The model integrates into Streamlit as a predictive tool to suggest movies that users are likely to favor.
- RDS (Relational Database Service): A MySQL database on Amazon RDS stores user interactions, favorites, and ratings.
- EC2 (Elastic Compute Cloud): The app is hosted on an EC2 instance, providing a scalable and high-availability environment for the app.
- Environment Variables: Sensitive information like database credentials is stored securely in Streamlit's
secrets.tomlfile.
The web app uses Streamlit for real-time interaction, displaying movie recommendations, ratings, and favorites through an intuitive UI. Pre-trained model files (rf_model.pkl, svd_model.pkl) are integrated directly into the Streamlit app for immediate use.
data/: Contains CSV files, including the original movie dataset, cleaned data, and generated datasets (favorites,interactions,ratings,users,imdb_clean).lib/: Includes Python scripts for data processing and machine learning functions:data_analysis.py: EDA for generated interaction data.data_generation.py: Functions to generate users, interactions, favorites, and ratings.imdb_data_analysis.py: EDA for the initial movie dataset.imdb_data_cleaning.py: Cleaning and transformation functions forimdb.csv.ml.py: Machine learning functions for SVD and Random Forest models.
model/: Stores trained models as.pklfiles (rf_model.pkl,svd_model.pkl) for Streamlit integration.notebook/: Jupyter notebooks executing each stage:data.ipynb: Generates data and EDA on interactions.imdb.ipynb: Performs cleaning and EDA on the movie dataset.ml.ipynb: Executes and evaluates ML models.
app.py: Streamlit application that connects to AWS, uses the trained models, and serves as the user interface.README.md: Documentation for the project.requirements.txt: Dependencies needed for the project.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/feelms.git cd feelms -
Install de required dependencies:
pip install -r requirements.txt
-
Set up the environment variables:
Streamlit's secrets manager is used to store sensitive information like the database credentials. Add the following to
.streamlit/secrets.toml:[database] DB_HOST = "your-db-host" DB_USER = "your-db-user" DB_PASSWORD = "your-db-password" DB_NAME = "your-db-name" DB_PORT = "your-db-port"
-
Run the Streamlit app:
streamlit run app.py
Once the app is running:
- Login or Register as a new user.
- Choose an emotion from the list provided (e.g., Happy, Excited, Relaxed).
- Browse through the movie recommendations.
- Rate movies and add them to your favorites.
- Enjoy your personalized movie experience!
- Emotion Weights: Positive emotions (e.g., Happy, Excited) were assigned higher weights in the generated interactions, increasing their frequency in recommendations to reflect user preferences for uplifting content.
- Active vs. Casual Users: About 20% of users are classified as active and engage with the platform frequently, while the other 80% are casual users, simulating real-world application usage patterns.
- Favorites and Ratings: Only about 30% of movies viewed are marked as favorites, and ratings are applied to 50% of those favorites. This setup models selective engagement, where users may choose to rate only the most impactful movies.
- Interaction Dates: Dates of interactions are randomly distributed within the last year, generating realistic activity trends over time.
- Validation through EDA: Exploratory Data Analysis was performed on both the initial movie dataset and the generated interaction data to ensure data quality, consistency, and appropriate distribution for modeling.
- Incorporate additional machine learning models to improve recommendation accuracy.
- Extend functionality to include TV series and documentaries.
- Implement user-based filtering to improve collaboration between users with similar preferences.
Experience Feelms in action and review our project presentation for a detailed overview of the development process, insights, and results:
- Feelms Streamlit Application: Try the app, select your mood, and receive personalized movie recommendations.
- Project Presentation: Dive into the project’s details, from data processing to model performance and future improvements.



