Inspiration
We decided we wanted to build a program involving web scraping and we were looking at different websites that we could potentially use.
What it does
The main feature of this program is displaying a list of Twitter users who are currently canceled. The data is scraped from trending tweets on Twitter and evaluated based on their sentiment value. The homepage of the website ranks the users based on the percentage canceled and displays the top user canceled on a jumbotron. The background of the jumbotron displays repeating images of the top user’s profile picture. Other features include a page that displays random tweets from all over the world.
How we built it
In order to get our data from Twitter, we used Selenium to web scrape current trending tweets. This data was then stored in SQL and processed into four columns: name of the person, number of times they showed up in a tweet, their sentiment value, and a list of Twitter handles. This data was then given to a python file which sorted the data based on the sentiment value and determined the most canceled Twitter user. We used the flask framework to connect the backend to the frontend. We sent lists and strings to the html file to display the data. We created the frontend using HTML/CSS along with the Bootstrap framework. We used the twitter API to get the profile picture of the user that was found to be most canceled. This picture was then displayed as the background of the jumbotron. We used Crontab to update the database in the background.
Challenges we ran into
Hosting the server to a different machine
Twitter accounts aren’t only made up of people, there are also many organizations that have accounts on Twitter. However, the goal of our program was to find out the most canceled person, so we needed to filter out other entities. We ended up using natural language processing through the library SpaCy. From there, we were able to send a list of names and filter out those who weren’t people.
To determine if a person was canceled, we checked current trending tweets and looked for people who were mentioned. However, someone getting mentioned isn’t necessarily a negative thing. So, we needed to use sentiment analysis using Vader to determine the sentiment of the tweet along with the intensity of said sentiment. That gave us a value to compare different people and determine who was more canceled.
Database creation Using twitter API As we continued to test our program, we found that we kept getting the error about too many connections and we were no longer able to get data from the Twitter API.
Accomplishments that we're proud of
Our team was made up of people with a variety of experiences (a freshman, a sophomore, a junior, and a senior), but we were all still able to work well together and find things to work on.
What we learned
One of our team members had never worked with HTML, CSS, or Python prior to the hackathon. Yet, he was able to quickly pick things up, and helped contribute a page in the front end part of the program. We learned to use Flask to connect our backend with our html. We also learned to use libraries like nltk in order to use natural language processing on the data that we got from Twitter.
What's next for Cancel Me
We will be up on Google by 2030!
Log in or sign up for Devpost to join the conversation.