01_eduge_classification

Eduge News Classification

This is a classification task for the Eduge dataset of 70 thousand Mongolian online news articles with labeled news categories. This project uses Fast.ai.

After training a language model using the Eduge dataset, a classifier accuracy of 93.5% is reached after 10 epochs. Training the classifier without the language model gives an accuracy of roughly 90.5% after 10 epochs.

Notebooks:

01 - Mongolian Language Model: A language model is trained using the Eduge dataset. An accuracy of 33% is reached for the language model.
02 - Eduge Classifier: Using the vocabulary and encoder (i.e. language model) from notebook 01, a classifier is trained to predict news category. An accuracy of 93.5% is reached after 10 epochs.
03 - Eduge Classifier w_out Pretrained LM: The purpose of this notebook is to show the difference in accuracy achieved in a classifer model with a pre-trained language model vs without. Below we can see that after several epochs the accuracy flattens out around 90-91%.

Data:

Data is in the data folder. Google Drive link.

news.csv: This is the Eduge news dataset as downloaded from the source. Used to train the language model and classifier.
eduge_clean.csv: Not used in the notebooks. This is a cleaned verison of the Eduge dataset. A slightly better accuracy can be obtained with the cleaned data. For comparison reasons this dataset isn't used.

Outputs:

Outputs are in the models folder. Google Drive link.

mn_eduge_lm.pth: The complete language model including encoder and decoder.
mn_eduge_lm_encoder.pth: The saved encoder from the learner. This is used to prep our learner in notebook 02.
mn_eduge_vocab.pkl: The vocabulary to correspdong with the language model. This is also needed in notebook 02 to create our dataloaders object.

Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
01 - Mongolian Language Model.ipynb		01 - Mongolian Language Model.ipynb
02 - Eduge Classifier.ipynb		02 - Eduge Classifier.ipynb
03 - Eduge Classifier w_out Pretrained LM.ipynb		03 - Eduge Classifier w_out Pretrained LM.ipynb
README.md		README.md
history_eduge_classification.csv		history_eduge_classification.csv
history_eduge_classification_nolm.csv		history_eduge_classification_nolm.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Eduge News Classification

Notebooks:

Data:

Outputs:

FilesExpand file tree

01_eduge_classification

Directory actions

More options

Directory actions

More options

Latest commit

History

01_eduge_classification

Folders and files

parent directory

README.md

Eduge News Classification

Notebooks:

Data:

Outputs: