Skip to content

ymodarai/The-Sentimentalizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

💬 The Sentimentalizer

The Sentimentalizer is a machine learning project that performs sentiment analysis on text data, classifying inputs as positive, negative, or neutral.
Originally developed for a hackathon, the project demonstrates how natural language processing (NLP) and classical machine-learning models can be combined to build a fast, interpretable sentiment classifier.

This project includes a notebook showing the full workflow:
data loading → text preprocessing → vectorization → model training → evaluation → real-time predictions.


🚀 Features

  • End-to-end sentiment analysis pipeline
  • Text cleaning & preprocessing (tokenization, stopwords, lemmatization)
  • Feature extraction using TF-IDF
  • Multiple sentiment classifiers tested, including:
    • Logistic Regression
    • Naive Bayes
    • Support Vector Machine (SVM)
  • Model comparison & evaluation metrics included
  • Real-time sentiment prediction examples
  • Cleaned dataset (save.csv) included for reproducibility

🧠 How It Works

1. Data Preparation

Text entries are imported from the dataset and undergo preprocessing:

  • Lowercasing
  • Removing punctuation & special characters
  • Tokenization
  • Stopword removal
  • Lemmatization/stemming

This ensures cleaner, more consistent input for the models.


2. Feature Engineering

The preprocessed text is vectorized using TF-IDF, which transforms raw text into numerical feature vectors based on word frequency and importance.


3. Model Training

Several classical NLP models are trained and evaluated:

  • Logistic Regression
  • Multinomial Naive Bayes
  • Linear SVM

Models are scored using:

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Confusion Matrix

The notebook includes complete comparisons to help determine the best-performing classifier.


4. Sentiment Prediction

Once trained, the model can classify any text input, e.g.: "That movie was surprisingly good!" → Positive "I expected better, it was disappointing." → Negative "It's okay, nothing special." → Neutral

The notebook includes multiple real-world examples.


📊 Results

While results depend on the dataset used, typical performance from the notebook shows:

  • TF-IDF + Linear SVM generally performs best
  • Logistic Regression performs competitively
  • Naive Bayes provides strong speed and baseline accuracy

The project emphasizes interpretability and ease of use rather than deep learning.


🎯 Use Cases

  • Classifying customer reviews
  • Social media sentiment monitoring
  • Product feedback analysis
  • Chatbots and automated response systems
  • Quick prototyping for NLP hackathons

📓 Included Notebook

The hackathon.ipynb file contains:

  • Full preprocessing pipeline
  • Training and evaluation of multiple models
  • Visual performance comparisons
  • Interactive prediction cells
  • All code required to reproduce the results

✨ Author

Created by Yasmin Modarai and Safwan Hasan for a machine learning hackathon.
Extended, organized, and documented for educational use and fast prototyping.


📜 License

This project is open for personal and educational use. For commercial use, please attribute the original author.

About

This project is able to accurately predict whether tweets are positive or negative, using an AI program.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors