Inspiration
Recently, I was reading an article about a political topic, and I believed it, but when I got to school, coincidently, our history class was talking about it, so I decided to add to the conversation, but when I started talking about this topic everyone just started looking confused, and I couldn't have felt more embarrassed. Every day, millions of people get tricked into believing politically and factually inaccurate information, which can have a genuinely negative effect on society and the individual themselves.
What it does
My project is a Google Chrome extension that uses a custom natural language processing to detect whether a highlighted section of an article is politically true or not, and if the piece of information is opinionated, our model will return "uncertain" as there is no right or wrong opinion.
How we built it
- I used a combination of a public political truth dataset and another dataset I custom scraped from different online news websites.
- I joined the two datasets together, combined the title of the article and the content into one string, shuffled the dataset, and then saved the new combined dataset into a CSV file.
- I split the data into 80% train and 20% test.
- To tokenize the data, I loaded a pre-trained BertTokenizer model, and the training and validation datasets were both converted into Hugging Face dataset objects. I then applied the tokenizer to both datasets (train and test) and finally formatted the tokenized data for PyTorch by formatting them as PyTorch tensors.
- I loaded a pre-trained BertForSequenceClassification model.
- I coded a training loop to train the model.
- Finally, I evaluated the model's performance.
Challenges we ran into
While making this project, I ran into a lot of difficult challenges. A major problem was the data, as I needed a lot of data to train an effective model, but the public dataset I found only had news up to 2021, so I created a bot to scrape various trustworthy news websites, such as Politifact, while excluding opinionated articles by using keywords such as "opinion". Another major difficulty was publishing my project on Google Extensions, as it was my first time, so I had to research a lot and do a lot of trial and error.
Accomplishments that we're proud of
In the end, I gathered enough data to effectively train the model, and I successfully published my project on Google Extensions
What we learned
I have had previous experience with machine learning and convolutional neural networks, but this was my first time working with natural language processing and transformers, so I learnt a lot about that,t and I learned a lot about creating a Google Chrome extension and publishing it.
What's next for TruthChecker
I plan to regularly scrape data so I can get a more comprehensive and up-to-date dataset, and regularly train my model so it can recognize new news. I plan to upgrade the model now that I've learnt more about natural language processing and transformers .
Built With
- beautiful-soup
- css
- html5
- javascript
- json
- kaggle
- numpy
- pandas
- python
- pytorch
- sklearn
- transformer

Log in or sign up for Devpost to join the conversation.