Skip to content
thundersparkf edited this page May 6, 2020 · 1 revision

Welcome to the Tweet Classifier repository!

This repository contains the source code that extracts the tweets of a specified user and uses a LSTM model to predict if said tweets contain manipulated information or if they are reliable.

Deep Learning Model used= LSTM

Twitter Authorisation= OAuth 1.0

FILES USED

Source files related to modelling: pre_proc, pre_model,model.

pre_proc: Preprocessing/Cleaning the text.

pre_model: Loading the pre_trained GloVe(in this code 6B tokens, 200d vectors) and converting text into tokens.

model: Model construction and execution. (Computationally expensive. Code executed on VM of 16vCPUs and 60GB RAM)

Source files related to Twitter API: CHIRP, Integrator

CHIRP: OAuth Authorization and methods to extract tweets based on name or keywords and trends based on location.

Integrator: Loading saved model and predicting the probability of credibility of tweets extracted.

METRICS

The trained model has the following metrics: Total Size of Dataset: 65698 records

Training Data: 45,988 records

           Accuracy Score  = 0.9911
           F1 Score        = 0.9908
           Precision Score = 0.9923
           Recall Score    = 0.9894

Testing Data: 19710 records

           Accuracy Score  = 0.9702
           F1 Score        = 0.9689
           Precision Score = 0.9763
           Recall Score    = 0.9616

In order to run this source code and create your own model from your dataset, please read the pre_proc.py file and perform modifications to suit your data sets.

Dataset sources

For News articles:

Kaggle

WOE Id

https://codebeautify.org/jsonviewer/f83352