Skip to content

WillsFilms/US-Top10-boxoffice

Repository files navigation

Box Office Analysis

This project takes data from Box Office Mojo on the top 10 highest grossing films at the US box office in the 21st century and examines and analyses various aspects of it. It also looks at the overall US box office numbers; i.e., total gross.

  • Box Office = Box_Office_top_10_2000_2024
  • Gross = Box_Office_gross_2000_2024

Data cleaning

Although the data are mostly clean from my collection, there are some issues that need resolving. One studio has gone through a name change during the time series represented by these data. I therefore wrote this line of code to make sure all old occurrences of the studio name where renamed to the new name to avoid any missrepresentation in my analysis.

image

Charts

I have set up a jupyter notebook that will process the input csv data into a number of pre-defined charts as listed below:

1. Box office budgets for highest grossing film

For this I made sure to create a filtered datafram with only the number one ranked film at the box office for each year: image

These data are then plotted within a line chart:

image

2. Average budgets for box office top 10

For this chart I calculated a new dataframe that had the average budget for each year within the original dataframe: image

These data are then plotted within a line chart:

image

3. The top 5 distributors in the top 10

For this chart I used a groupby() method to return a count of each distributor in the dataset. I then sorted count in descending order and applied a head(5) method as I only wanted to see the 5 most common distributors: image

These data are plotted in a horizontal bar chart:

image

4. Overall box office gross over time

This chart takes data from the Box_Office_gross_2000_2024 csv and plots total gross against year in a line chart. image

5. Density plot showing total films in the top 10 by distributor over time

For this chart, I used the groupby() method to group the dataframe by Year and Distributor and return the size of each grouped row: image

These data were then plotted within a scatter plot using a variable size and colourscale to indicate density:

image

6. Density plot showing total films in the top 10 by genre over time

For this chart, I used the groupby() method to group the dataframe by Year and Genre and return the size of each grouped row: image

These data were then plotted within a scatter plot using a variable size and colourscale to indicate density:

image

Future work

I will look to improve what insights I can gain from this data. This will include introducing an inflation adjustment for budget/gross data to provide a more reliable measure.

Credits

Data for this project are taken from Box Office Mojo - https://www.boxofficemojo.com/

About

Code used to produce analysis outputs on my Top 10 US Box Office kaggle dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors