This project combines object detection, multi-object tracking, and pose estimation to analyse volleyball training sessions. It uses a customized version of ByteTrack and RTMPose (through RTMlib) for tracking and pose analysis of players during spiking actions.
You can download pre-trained RTMDet and RTMPose ONNX models from OpenMMLab Deploee
Still in very early stages!
to do
Edit tracked IDs manually (delete unused IDs, "relabel" IDs)
Interpolation / smoothing/ manual editing of keypoints
Spike detection from pose data + some heuristics (to start with)
Performance optimisations
demo_det_m_pose_m_2025_04_04.mp4
- Detection Frequency Experiment: Attempted running detection every N frames (e.g., every 3 frames) to improve performance. However, this significantly degraded tracking accuracy (lost tracks, ID switches) due to the rapid and unpredictable movement of players in volleyball. Reverted to running detection on every frame to maintain tracking robustness.
- Profiling Refactoring: Cleaned up and enhanced the performance profiling in
scripts/MAIN.py.- Added distinct timing measurements for all major steps within the main loop (Capture, Detection, Tracking, Pose Estimation, HDF5 Write, Display Prep, CSV Write, Final Draw/Display).
- Updated on-screen display to show all component times and total FPS.
- Updated CSV logging to include all component times.
- Added a new "OVERALL TIMING STATISTICS" section to the terminal output, summarizing min/max/avg/median for each component (excluding the first frame).
- Formatted all terminal statistics output for better readability using tab alignment.
Implemented batch processing for RTMPose estimation.
Previously, each detected bounding box was processed sequentially (preprocess, inference, postprocess).
Now, all bounding boxes in a frame are preprocessed together, inferred in a single batch call to ONNX Runtime, and postprocessed together.
This significantly reduced the pose estimation time from ~20ms to ~11ms per frame.
Overall pipeline speed increased from ~22 FPS to ~26 FPS, meeting the initial 15-20 FPS target.
Detection (~19ms) is now the primary bottleneck.
Running ONNXruntime backend - CUDA Execution Provider, on GTX 1070Ti
RTMDet-m and RTMPose-m
video is 1080p 50FPS
Total processing speed approx. 12 FPS (90ms per frame) which is not usable (videos are 2 hours)
I know I could batch process but also want to investigate potential real time applications
In theory TRT provides great speedup especially running FP16 models.
Wasted many hours trying this.
The easiest would be not to use RTMlib but instead mmdeploy-runtime with a TRT engine.
However MMdeploy supports TRT 8.x (CUDA 11.8 - CudNN 9.7)
GPUs older than Turing (i.e. before RTX 20) do not have tensor cores so do not benefit from FP16 -> gains from TRT FP32 not so big
GPUs newer than Ada (i.e. after RTX 40) are not supported by TRT 8.x (they need TRT 10.x and CUDA 12.8 which MMdeploy doesn't support)
-> in practice, using MMdeploy-SDK (mmdeploy-runtime) with TRT models is only possible with Turing-Ampere-Ada (20, 30, 40 series)
Let's hope MMDeploy team has time to update/maintain again someday as it would be a shame to see that project deprecate.
We are staying on ONNX for now.
Almost all time is spent in detection and pose (45ms det / 45ms pose)
In both RTMdet and RTMpose, preprocessing the frame before inference (normalisation) is significant time cost
I have rewritten the normalisation step. See [notes here](misc project docs/optimising_preprocessing_normalisation.md)
-> saved approx 8ms off det and 5ms off pose
-> total time for 600 frames went from 47 to 31 seconds (50% speedup)
HPE_volleyball/
├── ByteTrack/ # Forked + modified ByteTrack repo (tracking)
├── models/ # model files (.onnx) for RTMPose, RTMDet, RT-DETR
├── data/ # Input videos
├── output/
│ ├── h5/ # HDF5 outputs: IDs, bboxes, keypoints, scores
│ └── video/ # Output videos with overlays
├── pipeline/ # Modular pipeline components
│ ├── detector_base.py # Detector interface and factory
│ ├── detectors/ # Detector implementations
│ │ ├── rtmdet_onnx.py # RTMDet ONNX adapter
│ │ └── rtdetr_onnx.py # RT-DETR ONNX adapter
│ ├── tracker_adapter.py # ByteTrack wrapper
│ ├── pose_adapter.py # RTMPose wrapper
│ └── profiling.py # Profiling utilities
├── scripts/ # Custom scripts (main pipeline, helpers)
├── paths.py # Project-relative path definitions
└── requirements.txt # Python dependencies
To run inference on GPU, make sure the following are properly installed:
- C++ Build Tools for Visual Studio: C++ compiler is required to build Cython wheels
- CUDA Toolkit (e.g. CUDA 12.x or compatible with your PyTorch version)
- cuDNN (compatible with your CUDA version)
- Check which version of CUDA your GPU driver supports:
nvidia-smiOn the top right you will see "CUDA Version", this is the most recent version you can use.
-
Download and install CUDA toolkit with appropriate version
-
Now go check version compatibility for compatible cuDNN and ONNX runtime versions
-
Download appropriate cuDNN version (use cuDNN archive if you need an older version)
- For cuDNN, I find the easiest is to copy / paste the dlls from cuDNN folder directly into CUDA folder.
- {cudNN install path}/bin/{version} -> copy and paste all dlls to {CUDA install path}/bin
- same for /include (.h files)
- same for /lib/x64 (.lib files)
- {cudNN install path}/bin/{version} -> copy and paste all dlls to {CUDA install path}/bin
- Alternatively you can add the three cuDNN folder to system PATH.
- For cuDNN, I find the easiest is to copy / paste the dlls from cuDNN folder directly into CUDA folder.
-
This repo installs onnxruntime-gpu version 1.20.1 by default (CUDA 12.x - cuDNN 9.x), if not compatible with CUDA / cuDNN, then install the compatible one.
Confirmed to work with CUDA 12.4 + CUDNN 9.7 on GTX 1070 Ti
Confirmed to work with CUDA 12.6 + CUDNN 9.8 on GTX 4060
-
Create a conda environment and activate it
conda create -n HPE-volleyball python=3.10 conda activate HPE-volleyball
-
Clone this repo
git clone https://github.com/f-fraysse/HPE_volleyball.git cd HPE_volleyball -
Set up environment
pip install -r requirements.txt
-
Install ByteTrack
cd ByteTrack pip install -e . cd ..
-
Install RTMlib in development mode
# Install the included RTMlib in development mode cd rtmlib pip install -e . cd ..
This installs the included RTMlib in development mode. The necessary modifications for outputting bbox scores and detailed profiling have already been implemented in this local copy.
-
(Optional) Ensure output folders are created:
from scripts.paths import ensure_output_dirs ensure_output_dirs()
Work in progress — main script(s) will be located in scripts/.
- add your input video to /data
- add your ONNX models to /models :
- download ONNX models from OpenMMLab Deployee: https://platform.openmmlab.com/deploee
- RTMDet model for detection
- RTMPose model for pose estimation
- M-size models seem to provide a good balance of performance and speed ( RTMDet-m, RTMPose-m)
- run scripts/MAIN.py, the start of the script has config options
- video file with overlaid bboxes, IDs, bbox scores and poses saved in output/video
- HDF5 file with tracked IDs, bboxes and scores, keypoints and scores saved in output/h5
The pipeline supports pluggable detectors. To switch between detectors:
-
RTMDet (default): Set
DETECTOR = 'rtmdet'inscripts/MAIN.py- Uses RTMDet ONNX models from OpenMMLab
- Current baseline: ~19ms detection time
-
RT-DETR: Set
DETECTOR = 'rtdetr'inscripts/MAIN.py- Uses RT-DETR ONNX models (must be exported first)
- See [RT-DETR ONNX Export Guide](misc project docs/rtdetr_onnx_export_guide.md)
- Potential for better real-time performance than RTMDet
The modular pipeline ensures consistent output format and profiling regardless of detector choice.
All Python packages are listed in requirements.txt.
GPU inference requires a working CUDA installation compatible with your PyTorch/ONNX versions
- ByteTrack has been modified (e.g. fixed deprecated NumPy types).
- RTMlib has been slightly modified (see Setup, Step 5) to output bbox scores
- All paths are defined relative to the project root via
paths.py.
Francois Fraysse - UniSA - Code was generated with assistance from Claude Sonnet 4.0 through Cline
Thanks and credits to:
- MMPose project - [https://github.com/open-mmlab/mmpose]
- RTMlib - [https://github.com/Tau-J/rtmlib]
- ByteTrack - [https://github.com/ifzhang/ByteTrack]
This project is licensed under the Apache 2.0 License.
It includes: