wallstreetbets-sentiment-analysis

The Project aims at collecting the posts from Reddit's r/wallstreetbets daily discussion thread and displaying some analytical results as an interactive dashboard.

Preview:

preview.wsb.dashboad.mp4

Link to the app: https://share.streamlit.io/ybangaru/wallstreetbets-sentiment-analysis/app.py

About the data collection and the app: A free-tire AWS EC2 instance is being used to collect data using pushshift and Praw. Pushshift allows you to fetch reddit posts based on datetime(Praw doesn't), besides, Praw is being used to collect the comments. So, I first saved the data collected as compressed CSV's into S3 buckets, let's call this the ELT approach as I wasn't sure what I'd like to derive out of the data exactly when I started off, so loading the data as it is into S3 would allow me to be flexible with my research, I believe ETL makes more sense when we know exactly what we want to work on. The data pipeline finds the last file upload date on S3 and fetches the reddit data from that following day keeping the continuity of the data being collected. Files from the S3 bucket are used to populate a table which was already created on Posgresql(the AWS RDS services are used to achieve this). A streamlit app is created for visualizing the data being collected. Further upgrates to the app are on the way ;)

fetch_reddit_data.py is run on a daily basis by setting up a cron job. Use the "crontab -e" on the terminal and add, for example, the following line towards the end of the file to run the cronjob on your instance, don't forget to change the file locations accordingly and also make sure to create crontest.log using "touch crontest.log" on the terminal while in the project folder/anywhere else you prefer. Generally logs are not in the project folder.

Eg: "0 5 * * * /home/ubuntu/wallstreetbets-sentiment-analysis/runcron-env.py>> /home/ubuntu/wallstreetbets-sentiment-analysis/crontest.log 2>&1" Check out this cool website to experiment your cron schedule expressions https://crontab.guru/, mine is setup to run at 5AM everyday.

database-wsb.py has some examples on how you could interact with postgres DB

runcron-env.py is basically a simple bash script that run's the fetch_reddit_data.py file.

test_data.csv, I also downloaded the data collected over the last few months into a csv file if someone would like to use it directly!

To run the script on your local machine/EC2

Pull the project from github and ideally, you should create and activate a new virtual environment before installing the necessary modules and running the script to avoid messing up the OS. link on how to use virtual environments on ubuntu: https://stackoverflow.com/a/35017905/4197386

Firstly install the modules with the help of requirements.txt file by using the following command:

"pip install -r requirements.txt"

To run the script, use the following command: (check out the comment under call_api() in app.py to use the csv file directly instead of pulling data from the DB)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wallstreetbets-sentiment-analysis

The Project aims at collecting the posts from Reddit's r/wallstreetbets daily discussion thread and displaying some analytical results as an interactive dashboard.

To run the script on your local machine/EC2

"pip install -r requirements.txt"

"streamlit run app.py"

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
database-wsb.py		database-wsb.py
fetch_reddit_data.py		fetch_reddit_data.py
requirements.txt		requirements.txt
runcron-env.py		runcron-env.py
test_data.csv		test_data.csv

Folders and files

Latest commit

History

Repository files navigation

wallstreetbets-sentiment-analysis

The Project aims at collecting the posts from Reddit's r/wallstreetbets daily discussion thread and displaying some analytical results as an interactive dashboard.

To run the script on your local machine/EC2

"pip install -r requirements.txt"

"streamlit run app.py"

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages