TwiPA

TwIPA dashboard

Inspiration

Twitter is a service that we both use and love. It allows users to connect with friends, industry professionals, companies, and allows you to follow the latest news.

However, like all social media, Twitter can have significant impacts on the mental health and worldviews of its users.

Wouldn’t it be nice to have more data about a Twitter user before you follow them? This way, you can have more agency about how your Twitter feed is curated and ensure that the content you see everyday is truly what you want to see.

What it does

TwiPA performs data analytics on recent tweets of a user you are interested in. After scraping a specified number of tweets from a user of your choice, TwiPA performs sentiment analysis and clustering of tweets and displays this data in interactive graphs. This data can be used to make a more informed decision about whether or not you would like to follow that account.

How we built it

TwiPA is entirely written in Python. We used snscrape and tweepy to scrape data of a specific public Twitter user. We used TextBlob and NLTK (Natural Language Toolkit) for performing sentiment analysis and generating the positivity and subjectivity metrics (depicted in the first two graphs on our UI).

We used scikit-learn to vectorize the scraped tweets, Principal Component Analysis to reduce the dimensions of these vectors so they can be rendered in 3D vector space, and the K-Means algorithm to cluster similar tweets together. The outputs of these algorithms are depicted in the last graph.

We used the Plotly Dash framework to render our interactive graphs and create an interactive UI.

Challenges we ran into

Neither of us had experience with Plotly Dash prior to the hackathon, which is the data visualization framework we used to create our application. Because of this, we had some trouble getting the front-end portion of our application to look exactly how we wanted to.

We also had issues with Twitter API limits (which did not let us scrape an arbitrary number of tweets). We were able to overcome this issue by using snscrape, a Python library that manually scrapes tweets instead of relying on an API.

Accomplishments that we're proud of

Our interactive graphs: The graphs are interactive and respond to the users motions. Altering any of the graphs parameters results in a seamless transition as new data populates TwIPA’s views. Clustering: Upon inspection of results from several accounts, our clustering algorithm appears to successfully cluster tweets based on certain topics. This doesn’t work perfectly for every account, but when it does, it’s super cool :).

What we learned

Coming into TwIPA, we didn’t have a lot of web-Dev experience. We learned a lot about HTML and CSS to present content in an aesthetic manner. We also learned a lot about Plotly’s data visualization tool Dash.

What's next for TwiPA

Deploying TwIPA to web hosting service would be the first step such that others can access and use it. Furthermore, TwiPA currently provides a lot of data to the users. For data geeks, this is amazing but it may overwhelm people who are not familiar with some of the terminology we use. Therefore, we plan to add a simple view to TwIPA to present its data in a more user friendly manner. We also want to fine tune the sentiment analysis and clustering algorithms for better results.