Skip to content

PyTorch: Speed up DEKR predictor#3121

Merged
MMathisLab merged 6 commits intoDeepLabCut:mainfrom
arashsm79:arash/speedup_dekr
Nov 4, 2025
Merged

PyTorch: Speed up DEKR predictor#3121
MMathisLab merged 6 commits intoDeepLabCut:mainfrom
arashsm79:arash/speedup_dekr

Conversation

@arashsm79
Copy link
Copy Markdown
Contributor

@arashsm79 arashsm79 commented Oct 10, 2025

Summary

This PR improves the performance of DEKR predictor.

  • Use advanced indexing instead of for loops.
  • Remove optimization TODOs.
  • Optimize both with and without the use of heatmap

Details

Profiling the DEKR predictor's forward call, shows a significant increase in speed. But the overall inference time of the architecture remains mostly unchanged (very minor improvements) since it is largely dominated by HRNet and torch native modules. (thanks to @maximpavliv for running the benchmark)

fps_vs_batchsize_128x128 fps_vs_batchsize_256x256 fps_vs_batchsize_512x512

Below shows (the rectangle in blue) that most of the inference time is spent in the HRNet and PyTorch native inference procedures.
I have optimized the rest of it (the rectangle in red) as much as I could.

image

Improve the performance of DEKR predictor.
Use advanced indexing instead of for loops and remove TODO.
@arashsm79 arashsm79 changed the title PyTorch: Speedup DEKR predictor PyTorch: Speed up DEKR predictor Oct 10, 2025
@arashsm79 arashsm79 marked this pull request as ready for review October 14, 2025 12:05
@maximpavliv maximpavliv self-requested a review October 14, 2025 12:09
Copy link
Copy Markdown
Contributor

@maximpavliv maximpavliv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring looks great — the vectorized version is much cleaner.
I’ve run the full integration testing suite on my side, and everything passes without issues. Nice work on this improvement!

@MMathisLab MMathisLab merged commit 54881d1 into DeepLabCut:main Nov 4, 2025
7 of 10 checks passed
Copy link
Copy Markdown
Collaborator

@deruyter92 deruyter92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the code step-by step and confirmed that evaluation results are close to previous implementation. The vectorization looks great, also happy that you added the tensor shapes in the comments. Looks all good to me!

deruyter92 added a commit to deruyter92/DeepLabCut-live that referenced this pull request Jan 21, 2026
This commit updates the DEKRPredictor to follow the DeepLabCut implementation in version 3.0.0rc13.  see DeepLabCut/DeepLabCut#3121
MMathisLab pushed a commit to DeepLabCut/DeepLabCut-live that referenced this pull request Jan 22, 2026
* DEKRPredictor: add non-maximum suppression (NMS)

This commit Updates the DEKR predictor to follow the DeepLabCut implementation in version 3.0.0rc7, see
DeepLabCut/DeepLabCut#2907

* DEKRPredictor: speed up with vectorized operations

This commit updates the DEKRPredictor to follow the DeepLabCut implementation in version 3.0.0rc13.  see DeepLabCut/DeepLabCut#3121

* PartAffinityFieldPredictor (PAF): Speed up cost computation

This commit updates the PAF predictor to follow the DeepLabCut implementation in version 3.0.0.rc13. See
DeepLabCut/DeepLabCut#3117

* HeatmapPredictor (single animal): speed up with vecorized operations

This commit updates the `HeatmapPredictor` in single_predictor.py to follow the implementation in DeepLabCut 3.0.0rc13. See DeepLabCut/DeepLabCut#3110
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants