IMDb Movie Analysis Project

Group Members

Alejandro Figueroa
Raynard Flores

Project Overview

This project aims to analyze IMDb movie data to provide insights into various aspects of the movie industry. Our primary focus is on understanding which movies are most valued by viewers within specific genres and how these preferences correlate with user ratings and vote counts. Additionally, we explore the relationship between a movie's gross earnings and its IMDb rating to determine if higher ratings are indicative of higher earnings, determine which genre (out Action, Animation, Horror, History) generates the most average votes by user, and which one has the longest average runtime.

Problem Statement

As cinephiles, we were curious to discover significant films within specific genres based on IMDB ratings and vote counts. As data analysts in training, we developed three hypotheses about these top rated movies in order to use the techniques we have learned so far: data cleaning, data wrangling, EDA, and data visualization.

Specific Goals:

Determine the highest rated movies within specific genres based on user ratings and vote counts. The genres that are going to be analysed are:
- Action
- Animation
- Horror
- History
Analyze the correlation between a movie’s gross earnings and its IMDb rating to understand the factors contributing to a movie's financial and critical success.
Of the highest rated movies by genres specified, determine genre which one has the highest average amount of votes.
Of the highest rated movies by genres specified, determine genre which one has the highest average runtime.

Data Description

Our main dataset comes from Kaggle, and essentially is a combination of CSV files that have information about movies with specific genres.

Kaggle Dataset: https://www.kaggle.com/datasets/rajugc/imdb-movies-dataset-based-on-genre

The dataset contains the following columns:

movie_id: IMDb Movie ID
movie_name: Name of the movie
year: Release year
certificate: Movie certificate rating
run_time: Total runtime of the movie
genre: Genre of the movie
rating: IMDb rating of the movie
description: Description of the movie
director: Director of the movie
director_id: IMDb ID of the director
star: Star of the movie
star_id: IMDb ID of the star
votes: Number of votes the movie received on IMDb
gross: Gross box office revenue of the movie in dollars

Our second data source is the Movie Database API, which collects movie information from IMDb and its freely hosted on the RapidAPI page.

Movie Database API: https://rapidapi.com/SAdrian/api/moviesdatabase

Hypotheses

Here, we will outline our initial hypotheses based on our problem statement. These hypotheses will guide our analysis and help us focus on specific relationships within the data.

Hypothesis 1: There is a correlation between a movie's IMDB rating and its worldwide gross.
Hypothesis 2: Among the selected common genres—Action, Horror, Animation, and History—we hypothesize that the Action genre generates the highest average number of votes.
Hypothesis 3: Among the selected genres, we hypothesize that the Action genre has the highest average runtime

Analysis Methodology

We will employ various data analytics techniques including data visualization, EDA, and data wrangling to explore the dataset and validate our hypotheses.

About the DataSet

To perform a similar analisis follow these simple steps:

Download the dataset (you will find it in https://www.sharkattackfile.net/incidentlog.htm)
Install dependencies into your coding notebook
Run various codes to explore and analyze the data
Come up with a conclusion with your findings

Dependencies

You will need to import the following:

Pandas --> import pandas as pd
Requests --> import requests
Pyplot --> import matplotlib.pyplot as okt
DotEnv --> from dotenv import load_dotenv
Path --> from pathlib import Path
OS --> import os

How to Contribute

Contributions to this project are welcome. You can contribute by:

Extending the analysis to include additional movie metrics.
Refining the visualizations and interpretations.

Please refer to the contribution guidelines before making a contribution.

Presentation Slides

Link to presentation

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
imdb_movies_by_genre		imdb_movies_by_genre
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
top_imdb_movies.xlsx		top_imdb_movies.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IMDb Movie Analysis Project

Group Members

Project Overview

Problem Statement

Specific Goals:

Data Description

Hypotheses

Analysis Methodology

About the DataSet

Dependencies

How to Contribute

Presentation Slides

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IMDb Movie Analysis Project

Group Members

Project Overview

Problem Statement

Specific Goals:

Data Description

Hypotheses

Analysis Methodology

About the DataSet

Dependencies

How to Contribute

Presentation Slides

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages