CourseProject (Text Classification)

Overview

The project I chose was the sarcasm detection competition.

I successfully beat the baseline by training a PyTorch BERT model using the Huggingface transformers library. I first researched what types of models could be used for this manner of text classification and found that the Huggingface library (https://huggingface.co/) had multiple pretrained models to suit this goal. Some minor data cleanup and preprocessing on the training/test data sets, along with concatinating the immediate context with the response allowed me to beat the baseline.

Libraries/Languages

Python
Pytorch
Huggingface transformers
Pandas
SKLearn

Implementation

I developed the entire project within a Google Colab (https://colab.research.google.com/) notebook. I used the free tier for all development and the final model training. I recommend opening the ipynb in Colab to run the code without needing to configure any of the libraries locally.

Usage

Open CS410Project.ipynb in Google Colab (https://colab.research.google.com/)
The data storage for a Google Colab workspace is ephemeral, so the train.jsonl and test.jsonl files must be uploaded to the workspace storage each time.
(Optional) For faster model training, click Edit -> Notebook Settings, then set the Hardware Accelerator to 'GPU' inside the popup. This was available for me using the free tier of Colab during development. The code will default to CPU computation if this is not done and the model will take longer to train.
Run all code blocks in order. The file answer.txt will be created in the workspace for download. I used 5 epochs for training and it generated predictions that beat the baseline.

Presentation

I did not have time to do a presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
CS410 Progress Report.pdf		CS410 Progress Report.pdf
CS410 Project Proposal.pdf		CS410 Project Proposal.pdf
CS410Project.ipynb		CS410Project.ipynb
README.md		README.md
answer.txt		answer.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CourseProject (Text Classification)

Overview

Libraries/Languages

Implementation

Usage

Presentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CourseProject (Text Classification)

Overview

Libraries/Languages

Implementation

Usage

Presentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages