Quantifying individual performance in cricket is essential for team selection and understanding player contributions to team success. This project applies concepts from social network analysis (SNA) to analyze the performance of batsmen and bowlers using a dataset. The dataset comprises four major parts: (1) batsmen facing specific bowlers, (2) overall batting averages for batsmen, (3) bowlers bowling to specific batsmen, and (4) overall bowling averages for bowlers. Building on previous research, the project constructs weighted and directed networks of interactions between batsmen and bowlers. For batsmen, a performance index is calculated based on runs scored against bowlers relative to their career bowling averages. Similarly, a quality index is determined for bowlers based on dismissals of batsmen relative to their career batting averages. Additionally, a one-mode projected network is generated to compare the relative importance of players. The PageRank algorithm is applied to evaluate the importance of each player in the network. By leveraging network analysis techniques, this project aims to provide insights into cricket player performance and contribute to the understanding of individual contributions to team success in the sport. Our results shows that Virat Kohli and Lasith Malinga are the most successful batsman and bowlers respectively in the history of Indian Premiere League (IPL).
Before running the project, you'll need to activate the virtual environment and install the required libraries.
On Windows:
- Open a terminal or command prompt and navigate to your project directory.
- Activate the virtual environment using the following command:
source env/Scripts/activate
- Once activated, the terminal prompt will likely change to indicate you're working within the environment.
- Install the required libraries listed in the requirements.txt file using:
pip install -r requirements.txt
This section outlines the steps involved in extracting cricket ball-by-ball data, processing it, and generating relevant statistics. It assumes the virtual environment and required libraries are set up as detailed in the "Setting Up the Environment" section.
- Download Ball-by-Ball Data: Download the data from https://cricsheet.org/ in YAML format. This data will be used for subsequent processing.
-
Create Folders: Organize your project with the following directory structure in the root directory:
tests: Stores the downloaded ball-by-ball data in YAML format (one file per match).processed: Holds the intermediate processed data for each match (four files per match).teamwise: Contains compiled data for all matches across teams.stats: Houses additional statistics derived from the processed data.results(New Folder): Stores the output generated by analysis scripts.
The following scripts work together to process the data and generate statistics:
extract.py: Parses a YAML file containing ball-by-ball data for a single match and extracts the required data.process.py: Acts as the driver forextract.py. It iterates over files in thetestsfolder, processes each match, handles errors, and logs results. Output is placed in theprocessedfolder.compile.py: Compiles data from all matches into a consolidated format for each team. It also generates ageneral_stats.logfile summarizing the data. Output goes into theteamwisefolder.realise_pib_qib.py(Optional): Calculates additional data required for building specific network models (PIB and QIB). This script takes data from theteamwisefolder as input and writes the results to thestatsfolder.
Once the data is extracted and processed, you can perform further analysis using the provided Python scripts in the project directory. These scripts typically take data from the processed, teamwise, or stats folders and generate visualizations or additional statistics stored in the results folder.
Here's an overview of some analysis scripts:
batting_analysis.py: Analyzes batting performance metrics.bowling_analysis.py: Analyzes bowling performance metrics.pib_qib_convergence_plot.py: (Optional) Creates convergence plots for PIB and QIB networks (if applicable).linear_regression_batting.py: Performs linear regression analysis on batting data.linear_regression_bowling.py: Performs linear regression analysis on bowling data.batsmen_subgraph.py: (Optional) Creates a subgraph focusing on specific batsmen (if applicable).
Note: The specific analysis performed by each script depends on the project's goals. Refer to the script documentation (if available) for detailed information.
The results generated by these scripts will be saved in the results folder. You can then explore these results to gain insights into player performance, team dynamics, and other aspects of cricket data.
