Oral presentation: https://drive.google.com/file/d/1fwVK94TnMxb-U-Am6koeczjaXh9uCU3s/view?usp=sharing

Final reflection

STRIKEOUT

Members

Nicholas Keirstead (nkeirste) Zahid Hasan (zhasan1)

Introduction

We have always been interested in incorporating technology into sports. Being able to tell whether a pitch will be in the strike zone or not before it is thrown is useful information for a batter. We wanted to see whether it would even be possible to tell a pitch from a windup, and we figured if we could get an accurate model to predict the pitch from a windup using deep learning

This project was enabled by this GitHub project by AJ Piergiovanni (https://github.com/piergiaj/mlb-youtube), which organizes footage of 20 full MLB baseball games into events, including balls, strikes, and pitch types. We use this dataset to analyze pitch videos labeled by outcome (ball or strike) and try to predict other pitch outcomes from the video of the pitcher's windup. This problem falls under classification, since the model is trained on labeled examples and predicts discrete outcomes.

Methodology

Preprocessing is a very important part of this project. Since the segmented videos we were able to obtain are too long and capture information we do not want, we need to cut each video. This should be done at the point the ball releases from the pitcher’s hand, so we do not inadvertently get any information on the pitch from the arc it travels.

The easiest way to get these results would be to approximate the number of frames a pitch takes (n) , and then take the first n frames of each video. However, this can be inaccurate, as the segmented videos we are using are not perfect and do not always start at the same place, not to mention the fact that different pitchers have different length windups. The optimal solution (and what we will be trying to implement) is run an off-the-self object detector on the videos, and then cutting each video when the baseball appears as an independent object from the pitcher. This may not be easy to implement, but should be much more accurate.

Knowing that the optimal solution would be difficult, we quickly wrote a script to automatically take the first n frames of a video, and used that preprocessed data to run through our model just so we would have some results if we ran into too much trouble with the optimal solution.

Also as a note, we represented our videos as 3d numpy arrays, to save on memory. We decided to convert each frame of the video to grayscale, as we did not believe color should matter in this problem. This allowed our 3rd dimension to be just the number of frames in the video, and overall gave us 4d tensors, which allowed us to use portions of the code we wrote for CNN homework.

Results

Our average training accuracy on the last of 10 epochs was 63.7%, and average loss was 0.676. For testing on the last of 10 epochs, accuracy was 66.6% and loss was 4512. This indicates a slight improvement on random guessing (50% chance between ball and strike). Nevertheless, accuracy and loss varied wildly between batches and epochs, displaying little benefit to further training. This likely indicates greater problems in the inconsistency of our input videos and perhaps lack of sophistication in the model.

Challenges

The hardest part of this project is preprocessing our data. We had some preliminary issues with the Github project we used to grab the data, rewriting code and substituting python packages. After resolving those, we are still left with data in an unfinished form, where we still need to make some cuts. We tried to automatically cut each video at around 3 seconds, but that resulted in very inconsistent videos. We then did some extensive research into using ready-made object detectors to help with the preprocessing.

We decided to use Detectron2, using the coco dataset, to detect when the baseball pops into frame, and then cutting the video there. However, we had issues running detectron2, as it requires pytorch and by extensension CUDA, which runs on an nvidia gpu. While one of our personal computers did have a gpu, we were having compatibility issues, as it ran cuda 11 but detectron2 only seemed to work on cuda 10 (at least for us). In any case, as we were nearing our deadline, we decided to try to use a different detector that may work better.

We found out that tensorflow hub has an object detector trained on the coco database, and we decided to use that. It was much easier to use, but perhaps less accurate. While cutting the videos, we had to convert them to avi files, as for some reason trying to write to mp4s would result in corrupted videos. However, this came with a massive increase in file size, as avis take up more space than mp4s. Ultimately this is where we ended, as we ran out of time to try to find a way of transferring that much data to the machine we were running our model on.

We also had consistent issues with running out of memory while running the model. We tried solving these by using some techniques we learned in class, like decreasing our batch size. We also only called the data in by batch instead of all at once. We were still having some issues with memory, and were only able to run our model for 1 epoch before it would stop.

Reflection

Despite our many issues with preprocessing and memory, we were able to reach our base and target goals, which were just to get a model running with automatically cut videos. While these were less consistent than we had hoped for, we quickly realized that more accurate preprocessing would take more time than we had. As a result, even though we were not able to run our model with accurately cut videos, we believe we did as well as we could have hoped.

Our model was the easiest aspect of our project, as it was the part that borrowed the heaviest from what we had already done. As a result, we believed that if the windup could predict the pitch, our model would be able to detect that. However, the inconsistent data compromises the model, and its results. Our results are inconclusive, but we feel that is mostly because of the data and not our model.

We changed our plans quickly, as we realized how difficult preprocessing was going to be. We made it a priority to have usable data, so we made automatic cuts first so we had something the model could run on. We also had to change object detectors when we were having compatibility issues with detectron, sacrificing some accuracy. If we could do the project over again, we would probably try to focus on getting a small amount of accurate data, instead of a lot of potentially inaccurate data. That way, we could be more sure about our results. We could have also tried to use some simpler methods instead of object detectors that use deep learning, a simple edge detector with bounding boxes for objects might have been more reliable.

If we had more time, we would try to improve the memory problems of the model, either with storing the data in an npy file and then grabbing only what we need, or some other ideas we came up with a little too late. We would also try to run the model on the more accurate object detected data to see if that would have made much of a difference.

Our biggest takeaway from this project is that sometimes, especially when you are working with a vast amount of data, the preprocessing part of the pipeline can be more difficult and time consuming than building/running the model. Had we devoted more time and resources to the preprocessing, we may have made more progress.

Checkin 1

Title

Using convolutional neural networks to predict baseball pitches from videos of a pitcher's windup

Who

Zahid Hasan (zhasan1) and Nicholas Keirstead (nkeirste).

Introduction

We have always been interested in incorporating technology into sports. Being able to tell whether a pitch will be in the strike zone or not before it is thrown is useful information for a batter. We wanted to see whether it would even be possible to tell a pitch from a windup, and we figured deep learning techniques would be able to help up us with that.

This project was enabled by this GitHub project by AJ Piergiovanni (https://github.com/piergiaj/mlb-youtube), which organizes footage of 20 full MLB baseball games into events, including balls, strikes, and pitch types. We use this dataset to analyze pitch videos labeled by outcome (ball or strike) and try to predict other pitch outcomes from the video of the pitcher's windup. This problem falls under classification, since the model is trained on labeled examples and predicts discrete outcomes.

Related Work

Data

Segmented videos of pitches with labeled outcomes were sourced from https://github.com/piergiaj/mlb-youtube (scraped from YouTube). It contains about 3000 examples of windups/pitches. Each is approximately a 5 second clip. Preprocessing will involve cutting all videos to stop right as the pitcher releases the ball, to hide the outcome.

Methodology

Preprocessing is a very important part of this project. Since the segmented videos we were able to obtain are too long and capture information we do not want, we need to cut each video. This should be done at the point the ball releases from the pitchers hand, so we do not inadvertently get any information on the pitch from the arc it travels.

The easiest way to get these results would be to approximate the number of frames a pitch takes (n) , and then take the first n frames of each video. However, this can be inaccurate, as the segmented videos we are using are not perfect and do not always start at the same place, not to mention the fact that different pitchers have different length windups. The optimal solution (and what we will be trying to implement) is run an off-the-self object detector on the videos, and then cutting each video when the baseball appears as an independent object from the pitcher. This may not be easy to implement, but should be much more accurate.

Metrics

Accuracy is an appropriate metric for our project, and it should be relatively easy to implement compared to the rest, as this is just a binary classification problem. The output of our neural network per video will either be "ball" or "strike", and we can just compare that to our ground truth for the video.

Our base goal is to implement this neural network without "intelligent" preprocessing, where we just choose an approximate number of frames to cut from each segment before running through the CNN.

Our target goal would be to implement "intelligent" preprocessing, where we would use an existing object detector to help more accurately process the videos, which would theoretically increase our accuracy.

Our reach goal is to build on this to create a similar neural net to identify the type of pitch (slider, fastball, etc.) rather than just its location at then end (ball or strike). This would become a multi-class classification porblem instead of a binary one, and would potentially be abit more difficult, but the accuracy metric would be similar to the binary version.

Ethics

Q) Why is Deep Learning a good approach to this problem? Deep Learning (CNN's specifically) should be useful because we seek to find patterns in the video of the pitcher's windup alone to predict the outcome of the pitch (ball or strike). Other non-deep methods would be difficult to tune to pitch videos, since windup styles can vary greatly and the background scenery varies.

Q) Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? The major stakeholders are baseball players (pitchers and hitters), coaches, and baseball statistics analysts. If successful, the model could identify patterns in a pitcher's motion that lead to more accurate pitches, or outcomes like hits. This is useful to pitching coaches hoping to improve a pitcher's motion. Similarly, opposing teams could theoretically use such a model to predict an opposing pitcher's accuracy or even pitches (although this is likely too difficult). It would be interesting to see, regardless, if certain types of motions lead to greater or fewer strikes.

Division of Labor

  • Preprocessing: Zahid
  • Model construction / training: Nick
  • Model testing: Nick
  • (Reach goals / pitch type: Zahid + Nick)

Checkin 2

Introduction

We have always been interested in incorporating technology into sports. Being able to tell whether a pitch will be in the strike zone or not before it is thrown is useful information for a batter. We wanted to see whether it would even be possible to tell a pitch from a windup, and we figured deep learning techniques would be able to help up us with that.

This project was enabled by this GitHub project by AJ Piergiovanni (https://github.com/piergiaj/mlb-youtube), which organizes footage of 20 full MLB baseball games into events, including balls, strikes, and pitch types. We use this dataset to analyze pitch videos labeled by outcome (ball or strike) and try to predict other pitch outcomes from the video of the pitcher's windup. This problem falls under classification, since the model is trained on labeled examples and predicts discrete outcomes.

Challenges

The hardest part of this project is preprocessing our data. We had some preliminary issues with the Github project we used to grab the data, rewriting code and substituting python packages. After resolving those, we are still left with data in an unfinished form, where we still need to make some cuts. We tried to to automatically cut each video at around 3 seconds, but that resulted in very inconsistent videos. We then did some extensive research into using ready-made object detectors to help with the preprocessing. We have decided to use Detectron2, using the coco dataset, to detect when the baseball pops into frame, and then cutting the video there. We have not been able to code this portion yet, and this have not finished preprocessing our data.

Insights

We are nearly finished with constructing the model, so no loss / accuracy data is available yet. This should be available in the next couple of days.

Plan

We plan to dedicate more time to preprocessing as described above. We also must finish the model and tweak the architecture based on results. After constructing a basic CNN model, we may attempt a CNN-RNN combination model to better make use of time dependence between video frames.

Built With

Share this project:

Updates