Skip to content

AlejandroJFR/week_3_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IMDb Movie Analysis Project

Group Members

  • Alejandro Figueroa
  • Raynard Flores

Project Overview

This project aims to analyze IMDb movie data to provide insights into various aspects of the movie industry. Our primary focus is on understanding which movies are most valued by viewers within specific genres and how these preferences correlate with user ratings and vote counts. Additionally, we explore the relationship between a movie's gross earnings and its IMDb rating to determine if higher ratings are indicative of higher earnings, determine which genre (out Action, Animation, Horror, History) generates the most average votes by user, and which one has the longest average runtime.

Problem Statement

As cinephiles, we were curious to discover significant films within specific genres based on IMDB ratings and vote counts. As data analysts in training, we developed three hypotheses about these top rated movies in order to use the techniques we have learned so far: data cleaning, data wrangling, EDA, and data visualization.

Specific Goals:

  1. Determine the highest rated movies within specific genres based on user ratings and vote counts. The genres that are going to be analysed are:

    • Action
    • Animation
    • Horror
    • History
  2. Analyze the correlation between a movie’s gross earnings and its IMDb rating to understand the factors contributing to a movie's financial and critical success.

  3. Of the highest rated movies by genres specified, determine genre which one has the highest average amount of votes.

  4. Of the highest rated movies by genres specified, determine genre which one has the highest average runtime.

Data Description

Our main dataset comes from Kaggle, and essentially is a combination of CSV files that have information about movies with specific genres.

The dataset contains the following columns:

  • movie_id: IMDb Movie ID
  • movie_name: Name of the movie
  • year: Release year
  • certificate: Movie certificate rating
  • run_time: Total runtime of the movie
  • genre: Genre of the movie
  • rating: IMDb rating of the movie
  • description: Description of the movie
  • director: Director of the movie
  • director_id: IMDb ID of the director
  • star: Star of the movie
  • star_id: IMDb ID of the star
  • votes: Number of votes the movie received on IMDb
  • gross: Gross box office revenue of the movie in dollars

Our second data source is the Movie Database API, which collects movie information from IMDb and its freely hosted on the RapidAPI page.

Hypotheses

Here, we will outline our initial hypotheses based on our problem statement. These hypotheses will guide our analysis and help us focus on specific relationships within the data.

  • Hypothesis 1: There is a correlation between a movie's IMDB rating and its worldwide gross.
  • Hypothesis 2: Among the selected common genres—Action, Horror, Animation, and History—we hypothesize that the Action genre generates the highest average number of votes.
  • Hypothesis 3: Among the selected genres, we hypothesize that the Action genre has the highest average runtime

Analysis Methodology

We will employ various data analytics techniques including data visualization, EDA, and data wrangling to explore the dataset and validate our hypotheses.

About the DataSet

To perform a similar analisis follow these simple steps:

  1. Download the dataset (you will find it in https://www.sharkattackfile.net/incidentlog.htm)
  2. Install dependencies into your coding notebook
  3. Run various codes to explore and analyze the data
  4. Come up with a conclusion with your findings

Dependencies

You will need to import the following:

  1. Pandas --> import pandas as pd
  2. Requests --> import requests
  3. Pyplot --> import matplotlib.pyplot as okt
  4. DotEnv --> from dotenv import load_dotenv
  5. Path --> from pathlib import Path
  6. OS --> import os

How to Contribute

Contributions to this project are welcome. You can contribute by:

  • Extending the analysis to include additional movie metrics.
  • Refining the visualizations and interpretations.

Please refer to the contribution guidelines before making a contribution.

Presentation Slides

Link to presentation

About

Project documentation for Alejandro Figueroa and Raynard Flores

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors