In today's digital world, fake news has become alarmingly prevalent, making it increasingly difficult to discern fact from fiction. With misinformation spreading rapidly across social media and news platforms, the ability to accurately identify fake news is more crucial than ever. This dataset aims to address this challenge by providing a comprehensive collection of labelled news articles, offering a valuable resource for developing and testing models designed to detect and combat fake news effectively.
- Python 3
- Pandas
- Numpy
- Seaborn
- Matplotlib
- NLTK
- sklearn
- mlxtend
The Fake-News-Detector Dataset provides a separate curated collection of news articles labelled as either "Fake" or "Real" to aid in developing and evaluating fake news detection models. This dataset is designed to support machine learning and natural language processing (NLP) tasks focused on distinguishing between genuine and deceptive news content.
The dataset is composed of two separate Excel files:
- Fake.xlsx: Contains news articles that are "Fake."
- Real.xlsx: Contains news articles that are "Real."
- Title: Title of the news article
- Text: Body text of the news article
- Subject: Subject category of the news article
- Date: Publish date of the news article
These files have been combined into a single dataset for comprehensive analysis and model testing.
Our comprehensive analysis of various machine learning algorithms and NLP techniques for fake news detection has led to a notable finding. Among the models evaluated, the Passive Aggressive Classifier demonstrated superior performance, achieving an exceptional accuracy of 99.65%. This model's incremental learning approach and scalability make it particularly effective for distinguishing between fake and genuine news.
While these results are promising, they represent just a fraction of the potential in the field of fake news detection. Ongoing advancements and emerging algorithms continue to offer opportunities for further enhancements and more advanced solutions.
