This project takes data from Box Office Mojo on the top 10 highest grossing films at the US box office in the 21st century and examines and analyses various aspects of it. It also looks at the overall US box office numbers; i.e., total gross.
- Box Office =
Box_Office_top_10_2000_2024 - Gross =
Box_Office_gross_2000_2024
Although the data are mostly clean from my collection, there are some issues that need resolving. One studio has gone through a name change during the time series represented by these data. I therefore wrote this line of code to make sure all old occurrences of the studio name where renamed to the new name to avoid any missrepresentation in my analysis.
I have set up a jupyter notebook that will process the input csv data into a number of pre-defined charts as listed below:
For this I made sure to create a filtered datafram with only the number one ranked film at the box office for each year:

These data are then plotted within a line chart:
For this chart I calculated a new dataframe that had the average budget for each year within the original dataframe:

These data are then plotted within a line chart:
For this chart I used a groupby() method to return a count of each distributor in the dataset. I then sorted count in descending order and applied a head(5) method as I only wanted to see the 5 most common distributors:

These data are plotted in a horizontal bar chart:
This chart takes data from the Box_Office_gross_2000_2024 csv and plots total gross against year in a line chart.

For this chart, I used the groupby() method to group the dataframe by Year and Distributor and return the size of each grouped row:

These data were then plotted within a scatter plot using a variable size and colourscale to indicate density:
For this chart, I used the groupby() method to group the dataframe by Year and Genre and return the size of each grouped row:

These data were then plotted within a scatter plot using a variable size and colourscale to indicate density:
I will look to improve what insights I can gain from this data. This will include introducing an inflation adjustment for budget/gross data to provide a more reliable measure.
Data for this project are taken from Box Office Mojo - https://www.boxofficemojo.com/





