💬 The Sentimentalizer

The Sentimentalizer is a machine learning project that performs sentiment analysis on text data, classifying inputs as positive, negative, or neutral.
Originally developed for a hackathon, the project demonstrates how natural language processing (NLP) and classical machine-learning models can be combined to build a fast, interpretable sentiment classifier.

This project includes a notebook showing the full workflow:
data loading → text preprocessing → vectorization → model training → evaluation → real-time predictions.

🚀 Features

End-to-end sentiment analysis pipeline
Text cleaning & preprocessing (tokenization, stopwords, lemmatization)
Feature extraction using TF-IDF
Multiple sentiment classifiers tested, including:
- Logistic Regression
- Naive Bayes
- Support Vector Machine (SVM)
Model comparison & evaluation metrics included
Real-time sentiment prediction examples
Cleaned dataset (save.csv) included for reproducibility

🧠 How It Works

1. Data Preparation

Text entries are imported from the dataset and undergo preprocessing:

Lowercasing
Removing punctuation & special characters
Tokenization
Stopword removal
Lemmatization/stemming

This ensures cleaner, more consistent input for the models.

2. Feature Engineering

The preprocessed text is vectorized using TF-IDF, which transforms raw text into numerical feature vectors based on word frequency and importance.

3. Model Training

Several classical NLP models are trained and evaluated:

Logistic Regression
Multinomial Naive Bayes
Linear SVM

Models are scored using:

Accuracy
Precision
Recall
F1 Score
Confusion Matrix

The notebook includes complete comparisons to help determine the best-performing classifier.

4. Sentiment Prediction

Once trained, the model can classify any text input, e.g.: "That movie was surprisingly good!" → Positive "I expected better, it was disappointing." → Negative "It's okay, nothing special." → Neutral

The notebook includes multiple real-world examples.

📊 Results

While results depend on the dataset used, typical performance from the notebook shows:

TF-IDF + Linear SVM generally performs best
Logistic Regression performs competitively
Naive Bayes provides strong speed and baseline accuracy

The project emphasizes interpretability and ease of use rather than deep learning.

🎯 Use Cases

Classifying customer reviews
Social media sentiment monitoring
Product feedback analysis
Chatbots and automated response systems
Quick prototyping for NLP hackathons

📓 Included Notebook

The hackathon.ipynb file contains:

Full preprocessing pipeline
Training and evaluation of multiple models
Visual performance comparisons
Interactive prediction cells
All code required to reproduce the results

✨ Author

Created by Yasmin Modarai and Safwan Hasan for a machine learning hackathon.
Extended, organized, and documented for educational use and fast prototyping.

📜 License

This project is open for personal and educational use. For commercial use, please attribute the original author.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
hackathon.ipynb		hackathon.ipynb
save.csv		save.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💬 The Sentimentalizer

🚀 Features

🧠 How It Works

1. Data Preparation

2. Feature Engineering

3. Model Training

4. Sentiment Prediction

📊 Results

🎯 Use Cases

📓 Included Notebook

✨ Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💬 The Sentimentalizer

🚀 Features

🧠 How It Works

1. Data Preparation

2. Feature Engineering

3. Model Training

4. Sentiment Prediction

📊 Results

🎯 Use Cases

📓 Included Notebook

✨ Author

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages