Box Office Analysis

This project takes data from Box Office Mojo on the top 10 highest grossing films at the US box office in the 21st century and examines and analyses various aspects of it. It also looks at the overall US box office numbers; i.e., total gross.

Box Office = Box_Office_top_10_2000_2024
Gross = Box_Office_gross_2000_2024

Data cleaning

Although the data are mostly clean from my collection, there are some issues that need resolving. One studio has gone through a name change during the time series represented by these data. I therefore wrote this line of code to make sure all old occurrences of the studio name where renamed to the new name to avoid any missrepresentation in my analysis.

Charts

I have set up a jupyter notebook that will process the input csv data into a number of pre-defined charts as listed below:

1. Box office budgets for highest grossing film

For this I made sure to create a filtered datafram with only the number one ranked film at the box office for each year:

These data are then plotted within a line chart:

2. Average budgets for box office top 10

For this chart I calculated a new dataframe that had the average budget for each year within the original dataframe:

These data are then plotted within a line chart:

3. The top 5 distributors in the top 10

For this chart I used a groupby() method to return a count of each distributor in the dataset. I then sorted count in descending order and applied a head(5) method as I only wanted to see the 5 most common distributors:

These data are plotted in a horizontal bar chart:

4. Overall box office gross over time

This chart takes data from the Box_Office_gross_2000_2024 csv and plots total gross against year in a line chart.

5. Density plot showing total films in the top 10 by distributor over time

For this chart, I used the groupby() method to group the dataframe by Year and Distributor and return the size of each grouped row:

These data were then plotted within a scatter plot using a variable size and colourscale to indicate density:

6. Density plot showing total films in the top 10 by genre over time

For this chart, I used the groupby() method to group the dataframe by Year and Genre and return the size of each grouped row:

These data were then plotted within a scatter plot using a variable size and colourscale to indicate density:

Future work

I will look to improve what insights I can gain from this data. This will include introducing an inflation adjustment for budget/gross data to provide a more reliable measure.

Credits

Data for this project are taken from Box Office Mojo - https://www.boxofficemojo.com/

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Box_Office_gross_2000_2024.csv		Box_Office_gross_2000_2024.csv
Box_Office_top_10_2000_2024.csv		Box_Office_top_10_2000_2024.csv
LICENSE		LICENSE
README.md		README.md
boxoffice analysis.ipynb		boxoffice analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Box Office Analysis

Data cleaning

Charts

1. Box office budgets for highest grossing film

2. Average budgets for box office top 10

3. The top 5 distributors in the top 10

4. Overall box office gross over time

5. Density plot showing total films in the top 10 by distributor over time

6. Density plot showing total films in the top 10 by genre over time

Future work

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Box Office Analysis

Data cleaning

Charts

1. Box office budgets for highest grossing film

2. Average budgets for box office top 10

3. The top 5 distributors in the top 10

4. Overall box office gross over time

5. Density plot showing total films in the top 10 by distributor over time

6. Density plot showing total films in the top 10 by genre over time

Future work

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages