This project aims to filter fake news from Twitter using:
-
The rtweet package (A wrapper for the twitter REST and Stream APIs)
-
The Hoaxy API for Fake News
- Shiny
- rtweet package
- newsanchor package
- Hoaxy
- Hoaxy API integration in R
- Text 2 Vec for Text Vectorization
- Obtaining training set: Hoaxy API for fake news Rtweet for real news
- Tidytext to transform data for analysis, and anti join with stop words, stem the words.
- Series of Visualizations comparing Real and Fake news.
- Train a random forest model with [insert number here] trees
- Test the model with a series of visualizations.
- Apply the model to User input and classify.
- Apply sentiment analysis?
- What words are most common in Real and Fake news?
- Rate words by impurity. (TF-IDF?)
- What words have the highest TFIDF score in Real and Fake news?
- What is the difference in average sentiment between Real and Fake news?
- Rank words by impurity within the Random Forest model.
- What words are most common in Real News, Fake News, and whether they exist in this user's tweets?
- Ranked words by Imputrity in the random forest model.
- Distribution of Tweet Sentiment between this user, real and fake news.
- Result.
- Input a twitter handle.
- Obtain a series of visualizations followed by an outcome (Fake or Real).