CDC 2023: Miles Murphy, Leon Tran, Cal Courbois, Justin Rivera

Carolina Data Challenge 2023 - Social Science Track

QUESTION:

In our project, we utilized Spotify's user API to scrape data on over 11,000 songs that were relased between 2000 and 2010. Through our research, we aimed to answer the question- What makes a song nostalgic? Why are we drawn to listen to certain throwback hits over others?

METHODOLOGY:

To collect our data for this project, we first had to create an application that accessed and utilized the Spotify API. After doing this, we used the Spotipy library to scrape every playlist ID from Spotify that was released between the year 2000 and 2010. Using these playlist ID's, we then created an array that stored the data on every singular track released in each of these albums. Data wrangling and cleaning this array allowed us to neatly display our findings in a data frame using the Pandas library.

DATA:

To answer our question we focused on seven main data points. Album release date, artist name, song name, available markets, duration (in seconds), explicit rating, and Spotify's 'popularity' value- a hidden value assigned to each song on Spotify, calculated based on how consistent a song's streaming numbers have been since it was released.

RESULTS:

Our research led us to some interesting conclusions on Spotify user listening habits and what variables help a song maintain its popularity over time.

As expected, the average popularity score of songs decreased each year after the song was released (as seen in Diagram 1). The explicit variable and average duration variable, however, gave much more interesting results. When the average 'popularity' value was calculated for songs labeled explicit vs. songs were not, songs that were explicit had an average score almost 15 points higher (as seen in Diagram 2). This led us to the conclusion that explicit songs have a substantially higher sustained popularity over time. Similarly to the findings with the explicit variable, duration (in seconds) showed valuable results. As can be seen in Figure 3, there is a direct correlation between a songs duration and its 'popularity' score. Our findings demonstrate that the shorter a song is, the more sustained popularity it seems to hold.

CHALLENGES:

It took a while to figure out how to access the api, then utilize spotipy. We also struggled with building the data frame (figuring out how to access the variables inside the playlists returned from spotipy). Eventually, we had our data frame and it was easier from there. However, we then realized we wanted even more data so we had to almost revamp our entire data set and wait a really long time to gather it. Also, everyone had their own lives and difficulties, one of our teammates had a family member die so he was no longer able to help.

CONCLUSION:

Overall, really proud of what we've done. We hopped in here with no data experience, just what we've learned in college so far and from our few solo fun projects. Hopefully we keep improving and learning more.

Built With

Share this project:

Updates