Comparing two different architectures of RNNs, the former a bidirectional GRU and the latter a bidirection LSTM, on how they train on a text categorization task.
PytorchNatural Language ToolKit (NLTK)Scikit-learnPandas, NumpyNetron.appMatplotlib
LSTM (https://netron.app/?url=https://github.com/Asymmetric-OG/NewsClass/raw/refs/heads/master/lstm.onnx)
Evidently, the GRU overfits early and heavily due to the vanishing gradients problems whereas the LSTM has more stable training due to its better performance on longer sequences of text.
Peak GRU validation accuracy : 66.2
Peak LSTM validation accuracy : 71.5
Peak GRU validation accuracy : 60+ (OVERFITTED)
Peak LSTM validation accuracy : 60+ (GENERALISES WELL)
This emphasizes on the LSTMs ability to tweak its gradients efficiently over a period of 50 epochs whereas they explode/vanish for the former model.
Dataset.json: News Category Classification Dataset.classifier.ipynb: The entire workflow.grumodel.onnx: Post-training GRU model for visualisationlstm.onnx: Post-training GRU model for visualisation

