The Sentimentalizer is a machine learning project that performs sentiment analysis on text data, classifying inputs as positive, negative, or neutral.
Originally developed for a hackathon, the project demonstrates how natural language processing (NLP) and classical machine-learning models can be combined to build a fast, interpretable sentiment classifier.
This project includes a notebook showing the full workflow:
data loading → text preprocessing → vectorization → model training → evaluation → real-time predictions.
- End-to-end sentiment analysis pipeline
- Text cleaning & preprocessing (tokenization, stopwords, lemmatization)
- Feature extraction using TF-IDF
- Multiple sentiment classifiers tested, including:
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
- Model comparison & evaluation metrics included
- Real-time sentiment prediction examples
- Cleaned dataset (
save.csv) included for reproducibility
Text entries are imported from the dataset and undergo preprocessing:
- Lowercasing
- Removing punctuation & special characters
- Tokenization
- Stopword removal
- Lemmatization/stemming
This ensures cleaner, more consistent input for the models.
The preprocessed text is vectorized using TF-IDF, which transforms raw text into numerical feature vectors based on word frequency and importance.
Several classical NLP models are trained and evaluated:
- Logistic Regression
- Multinomial Naive Bayes
- Linear SVM
Models are scored using:
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix
The notebook includes complete comparisons to help determine the best-performing classifier.
Once trained, the model can classify any text input, e.g.: "That movie was surprisingly good!" → Positive "I expected better, it was disappointing." → Negative "It's okay, nothing special." → Neutral
The notebook includes multiple real-world examples.
While results depend on the dataset used, typical performance from the notebook shows:
- TF-IDF + Linear SVM generally performs best
- Logistic Regression performs competitively
- Naive Bayes provides strong speed and baseline accuracy
The project emphasizes interpretability and ease of use rather than deep learning.
- Classifying customer reviews
- Social media sentiment monitoring
- Product feedback analysis
- Chatbots and automated response systems
- Quick prototyping for NLP hackathons
The hackathon.ipynb file contains:
- Full preprocessing pipeline
- Training and evaluation of multiple models
- Visual performance comparisons
- Interactive prediction cells
- All code required to reproduce the results
Created by Yasmin Modarai and Safwan Hasan for a machine learning hackathon.
Extended, organized, and documented for educational use and fast prototyping.
This project is open for personal and educational use. For commercial use, please attribute the original author.