README.md

Spam Classification of Emails

Models Used

Logistic Regression
Decision Trees
Naive Bayes

Requirements

Python 3.6.10
Numpy 1.18.4
Scikit-learn 0.23.1
Scipy 1.4.1

Steps to run:

Run the preprocessing file to extract the data from email, get the clean text after removing stopwords, punctuations and upload to ES after splitting into training and testing.

python preprocessing.py
--dirpath
--labels
--index
--seed 4
Get the unigrams from the Elastic Search.

python getUnigrams.py
--index
Build matrix and run the models.

python spamClassifier.py
--index
--labels
--features
--cutoff
--result
--model <model types are : reg, logit(default), tree, nb(naive bayes)>
--sparse (use only when u want to create sparse matrix)

Results

Top 10 words after running Logistic Regression on unigrams sparse matrix:

('freebsd', 1.1172201336086656).
('click', 1.1067116575308757).
('antivir', 1.079915881679438).
('penis', 1.0705047080248018).
('opt', 1.0648006849173453).
('girl', 1.0265354745168107).
('adf', 1.0224262305744003).
('website', 0.9591053071640367).
('products', 0.8749953047612927).
('remove', 0.8748610351597089).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spam Classification of Emails

Models Used

Requirements

Steps to run:

Results

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Spam Classification of Emails

Models Used

Requirements

Steps to run:

Results