-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Tweet Classifier repository!
This repository contains the source code that extracts the tweets of a specified user and uses a LSTM model to predict if said tweets contain manipulated information or if they are reliable.
Deep Learning Model used= LSTM
Twitter Authorisation= OAuth 1.0
pre_proc: Preprocessing/Cleaning the text.
pre_model: Loading the pre_trained GloVe(in this code 6B tokens, 200d vectors) and converting text into tokens.
model: Model construction and execution. (Computationally expensive. Code executed on VM of 16vCPUs and 60GB RAM)
CHIRP: OAuth Authorization and methods to extract tweets based on name or keywords and trends based on location.
Integrator: Loading saved model and predicting the probability of credibility of tweets extracted.
The trained model has the following metrics: Total Size of Dataset: 65698 records
Training Data: 45,988 records
Accuracy Score = 0.9911
F1 Score = 0.9908
Precision Score = 0.9923
Recall Score = 0.9894
Testing Data: 19710 records
Accuracy Score = 0.9702
F1 Score = 0.9689
Precision Score = 0.9763
Recall Score = 0.9616
In order to run this source code and create your own model from your dataset, please read the pre_proc.py file and perform modifications to suit your data sets.
For News articles:
Kaggle
- https://www.kaggle.com/c/fake-news/data
- https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset
WOE Id