Data is from Kaggle Real or Not? NLP with Disaster Tweets competition.
- Load Datasets and required packages (Keras, Tensorlow, Clean-Text, matplotlib, numpy, pandas).
- Clean text using the loaded packages (changing to lower case, remove line breaks, remove punctuation.
- Tokenize the words, pad sequences.
- Download Glove Embeddings, Create embedding matrix from the downloaded weights.
- Create Network Architecture using Sequential layer, Bi-Directional LSTM, GRU layers (Architecture is arrived at after many iterations.)
- Fit the model (I used TPU's provided by Kaggle)
- Predict on test data.
- Network with Glove embeddings gave much better accuracy than normal networks.
- Cleaning of text has impact on accuracy
- Bi directional LSTM and GRU gave better results than uni directional ones.