Inspiration
Our inspiration for this project was mostly due to the fact that during the course selection process there's usually a lot of juggling between looking at grade distributions, scouring the reddit, and also checking Rate My Professor for every single professor for every single class you are planning to take. To expedite this, we created { class overflow } that can do this for the students through our own webscraping for these sites.
What it does
It takes the data from the anex.us grade distributions and rate my professor site to generate a comprehensive "difficulty score" for both the class and the professor. The data from the r/aggies subreddit is webscraped and then pipelined through an NLP process to extract further conclusions on the difficulty of the class using real statements made by aggies on the subreddit.
How we built it
We separated the project into two teams, front-end and back-end. The back-end team focused on developing the web scrapers to extract the data from the three sites as well as figuring out how to process the reddit data and how we could generate real, helpful conclusions from the data. We also created a local Flask API so that the front-end could communicate with our back-end. The front-end team focused on creating a simple yet powerful front-end design that would accurately capture all the data needed for the custom backend API. The front-end team was also responsible for verifying the data being passed by users was valid input that our backend could process on.
Challenges we ran into
A big challenge we ran into was the actual webscraping process. Because the sites we chose to get our data from had APIs either ridiculously expensive (Reddit) or just didn't have an API at all (anex.us, rate my professor), we had to develop our own web scraping functions from scratch. Through the BeautifulSoup python library and a few rough hours sorting through some funky outputs, we were finally able to extract all the important data from all these sites. Another challenge we ran into was how we would actually quantify how certain aspects would influence the difficulty of a class / how a professor would make a class harder or easier. We had to create our own algorithm to calculate a "difficulty score" and a "professor score" that now serve as the prime focus points of our app. A big challenge we are still facing is that the RMP scraper is for some reason really slow, and is the main time bottleneck of our app. Once we tackle this, we think our app could be published.
Accomplishments that we're proud of
We're proud of having created an app that seems accurate in determining the difficulty of certain professors by trying it out with our past professors and seeing that in fact, my professor for CSCE120 really was THAT bad. We're also proud of the fact that we chose to weigh not only objective stats such as grade distributions, but also subjective stats such as the ratings on RMP. I also like that we spent the ROUGH time of developing the reddit scraper because that allowed for us to get more specific info about different classes rather than using the numbers alone to generate conclusions.
What we learned
We definitely learned a lot about web scraping and how difficult it is to simplify websites due to their complex nature. Additionally, we had to do some research originally while figuring out how we were going to do the NLP processing to the reddit text. After that, we had to connect the frontend and backend which, of course, will never happen without a bunch of issues popping up. These issues were good to debug though because they contained a lot of important info and allowed us to debug other parts of our program that we wouldn't have caught if it wasn't for the difficulty when tying them together.
What's next for { class overflow }
What's next for { class overflow } is solving the bottleneck put in place by the RMP scraper. Once we identify how to substantially speed up the process, we'll be able to allow users to see their results much faster.
Log in or sign up for Devpost to join the conversation.