Spam Classification of Emails

Models Used

Logistic Regression
Decision Trees
Naive Bayes

Requirements

Python 3.6.10
Numpy 1.18.4
Scikit-learn 0.23.1
Scipy 1.4.1

Steps to run:

Run the preprocessing file to extract the data from email, get the clean text after removing stopwords, punctuations and upload to ES after splitting into training and testing.

python preprocessing.py
--dirpath
--labels
--index
--seed 4
Get the unigrams from the Elastic Search.

python getUnigrams.py
--index
Build matrix and run the models.

python spamClassifier.py
--index
--labels
--features
--cutoff
--result
--model <model types are : reg, logit(default), tree, nb(naive bayes)>
--sparse (use only when u want to create sparse matrix)

Results

Top 10 words after running Logistic Regression on unigrams sparse matrix:

('freebsd', 1.1172201336086656).
('click', 1.1067116575308757).
('antivir', 1.079915881679438).
('penis', 1.0705047080248018).
('opt', 1.0648006849173453).
('girl', 1.0265354745168107).
('adf', 1.0224262305744003).
('website', 0.9591053071640367).
('products', 0.8749953047612927).
('remove', 0.8748610351597089).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
README.md		README.md
findSpam.py		findSpam.py
getUnigrams.py		getUnigrams.py
preprocessing.py		preprocessing.py
spamClassifier.py		spamClassifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Classification of Emails

Models Used

Requirements

Steps to run:

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spam Classification of Emails

Models Used

Requirements

Steps to run:

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages