Skip to content

PrathikI/DRIBBLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

D.R.I.B.B.L.E

Data-Driven Insights for Basketball Location & Efficiency

This project leverages a K-Nearest Neighbors (KNN) algorithm with various distance metrics and ensemble methods to predict NBA shot outcomes based on historical location data, enhancing decision-making by identifying optimal shooting strategies.

Project Structure

The project is organized into several key modules, each responsible for a specific aspect of the workflow:

1) data_loader.py

This script is responsible for loading the shot logs and other relevant datasets from specified file paths. It utilizes pandas to read data from CSV files, ensuring all necessary data like shot logs, player statistics, and game schedules are loaded into memory for further processing. This file sets the foundation for the data pipeline by providing the raw data needed for preprocessing and analysis.

2) data_cleaning.py

In this file, data preprocessing routines are implemented to clean and prepare the datasets for analysis. It includes handling missing values, correcting data types, and potentially filtering out irrelevant data points to streamline the datasets. The script ensures data integrity is maintained and the datasets are optimized for high-performance modeling, which is crucial for accurate machine learning predictions.

3) data_analysis.py

This script performs exploratory data analysis (EDA) and feature engineering on the cleaned data. It involves statistical analysis to understand the distributions of various features, the creation of new features based on existing data (e.g., calculating shot efficiency based on player and location), and the selection of relevant features that will be used for training the machine learning models. This file is key to uncovering insights from the data and preparing it in a format that enhances the predictive capabilities of the model.

4) model.py

This module contains the core machine learning implementation using the K-Nearest Neighbors (KNN) algorithm. It evaluates different KNN configurations, including variations with different distance metrics (like Euclidean, Manhattan, and Minkowski), and advanced methods such as weighted KNN and bagging to improve prediction accuracy. The script trains the model on the preprocessed data, validates its performance using accuracy metrics, and outputs the model’s predictive results. This file is central to the project as it directly handles the creation, training, and evaluation of the predictive model.

5) output.py

The output script is designed to format the results of the model into a user-friendly format or save them to a file. It might include functions to structure the model's outputs into readable reports or dashboards, and handle exporting data to CSV files or databases for further use or presentation. This file ensures that the insights generated by the model are accessible and actionable for end-users.

Running the Project

To execute the full workflow, run the following commands:

  1. Run the backend process:
    python main.py
    
  2. Run the frontend process:
    streamlit run website.py
    

6) Model Accuracies

  • Best cross-validated score: 67.61%
  • Test set accuracy: 67.78%

About

Data-Driven Insights for Basketball Location & Efficiency

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors