Inspiration
The most exciting moment in a baseball game is always when the batter hits an absolute dinger and the ball soars into the crowd. However, depending on what the pitcher throws, going for hard hits is not always optimal. We wanted to derive a data-backed answer on how to hit balls.
What it does
Data Dingers uses several machine learning models to predict whiff and hard-hit probabilities and extract actionable information on how to hit dingers. We use a random forest to determine optimal bat speed and swing length in several scenarios depending on how valued hitting hard is compared to not whiffing.
How we built it
We used juptyer notebook as well as several python libraries to manipulate most of our data. We first parsed our data, removing columns that weren't relevant. Then, we split our data into a series of subsets and determined features we would put into our machine learning models. From there, we created a heatmap of our results with score defined depending on weights we assigned to whiffs and hard hits.
What we used:
Data processing and presentation
- Python
- Jupyter notebook
- Pandas package
- Matplotlib package
Machine learning
- XGBoost
- Logistic regression
- Random Forest
Challenges we ran into
A challenge we ran into was finding which machine learning model to use. We first narrowed our choices down to 3 that would make sense: logistical regression, random forest, and XGBoost. We make calibration tests for all of our models, and found that base random forest had the highest accuracy. We were thus able to empirically decide that we would focus on random forest as our final machine learning model.
Accomplishments that we're proud of
Using new tools such as juptyer notebook and python, we were able to parse data very well. We were expecting to run into alot more techincal issues and spend alot of time debugging, but our code runs relatively smoothly. We are also proud to have implemented a machine learning model, as that was a goal that most of our team members had going into this datajam
What we learned
We learnt how to use visualization libraries like pandas and matlib, as well as the theory behind many machine learning models.
What's next for Data Dingers
- Extend to player-specific models
- Integrate pitch sequencing for dynamic strategy
- Deploy an interactive dashboard for coaches
- Use in tandem with computer vision to give real-time analysis
Log in or sign up for Devpost to join the conversation.