Data Dingers

Inspiration

The most exciting moment in a baseball game is always when the batter hits an absolute dinger and the ball soars into the crowd. However, depending on what the pitcher throws, going for hard hits is not always optimal. We wanted to derive a data-backed answer on how to hit balls.

What it does

Data Dingers uses several machine learning models to predict whiff and hard-hit probabilities and extract actionable information on how to hit dingers. We use a random forest to determine optimal bat speed and swing length in several scenarios depending on how valued hitting hard is compared to not whiffing.

How we built it

We used juptyer notebook as well as several python libraries to manipulate most of our data. We first parsed our data, removing columns that weren't relevant. Then, we split our data into a series of subsets and determined features we would put into our machine learning models. From there, we created a heatmap of our results with score defined depending on weights we assigned to whiffs and hard hits.

What we used:

Data processing and presentation

Python
Jupyter notebook
Pandas package
Matplotlib package

Machine learning

XGBoost
Logistic regression
Random Forest

Challenges we ran into

A challenge we ran into was finding which machine learning model to use. We first narrowed our choices down to 3 that would make sense: logistical regression, random forest, and XGBoost. We make calibration tests for all of our models, and found that base random forest had the highest accuracy. We were thus able to empirically decide that we would focus on random forest as our final machine learning model.

Accomplishments that we're proud of

Using new tools such as juptyer notebook and python, we were able to parse data very well. We were expecting to run into alot more techincal issues and spend alot of time debugging, but our code runs relatively smoothly. We are also proud to have implemented a machine learning model, as that was a goal that most of our team members had going into this datajam