A machine learning project to classify SMS messages as spam or ham (not spam) using natural language processing (NLP) techniques.
This project aims to build an SMS spam classifier using traditional NLP techniques and machine learning algorithms. The model learns to distinguish spam messages from legitimate ones using a labeled dataset of SMS messages.
- Source: UCI SMS Spam Collection Dataset
- Format: CSV file with two columns:
label: spam or hammessage: text content of the SMS
- Language: Python
- Libraries:
pandas,scikit-learn,nltk,matplotlib,seaborn - Modeling: Naive Bayes, Logistic Regression, SVM, etc.
- Notebook: Jupyter Notebook (
sms_spam_detection.ipynb)
- Lowercasing
- Punctuation removal
- Stopword filtering
- Stemming
- Spam vs Ham distribution
- Common word frequencies
- Using TF-IDF and CountVectorizer
- Tested models: Naive Bayes, Logistic Regression, SVM
- Accuracy, Precision, Recall, F1-score
- Confusion Matrix
We're always open to feedback, suggestions, and collaboration on similar NLP or machine learning projects.
Connect with us on LinkedIn:
Feel free to reach out — let’s build something cool together! 🚀