Skip to content

PyTorch: Speed up PAF cost computation#3117

Merged
AlexEMG merged 6 commits intoDeepLabCut:mainfrom
arashsm79:arash/speedup_paf
Oct 24, 2025
Merged

PyTorch: Speed up PAF cost computation#3117
AlexEMG merged 6 commits intoDeepLabCut:mainfrom
arashsm79:arash/speedup_paf

Conversation

@arashsm79
Copy link
Copy Markdown
Contributor

@arashsm79 arashsm79 commented Oct 9, 2025

Summary

Improve PAF performance by performing affinity computation on the GPU with advanced indexing.

  • Affinities are now calculated using torch operations.
  • The cost per batch dictionary is created more efficiently.

Details

This implementation tries to delegate the parts that can be parallelized to the GPU by using torch operations instead of numpy ones. (thanks to @maximpavliv for running the benchmark)

fps_vs_batchsize_128x128 fps_vs_batchsize_256x256 fps_vs_batchsize_512x512

The figure below shows the parts of the execution that can be optimized.
The part outlined by the red rectangle concerning compute_peaks_and_costs is now as optimized as I could make it.
The blue rectangle is concerned with the assembly procedure which did not get into in this PR. There is a lot of room for optimization in there as well, which may require a lot of refactoring/changes.

image

Improve PAF performance by performing affinity computation on the GPU
with advanced indexing.

- Affinities are now calculated using torch operations.
- The cost per batch dictionary is created more efficiently.
@arashsm79 arashsm79 changed the title PyTorch: Speedup PAF cost computation PyTorch: Speed up PAF cost computation Oct 10, 2025
@arashsm79 arashsm79 marked this pull request as ready for review October 14, 2025 15:32
@maximpavliv maximpavliv self-requested a review October 14, 2025 15:33
Copy link
Copy Markdown
Contributor

@maximpavliv maximpavliv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! ✅

  • Code is clearer, docstrings are much more detailed, and variable naming is improved (batch_size, paf_limb_inds).
  • GPU usage is now more efficient, avoiding unnecessary early CPU transfers → speed is improved.
  • The inference results slighly differ from expected results.

@AlexEMG AlexEMG merged commit d5641bf into DeepLabCut:main Oct 24, 2025
9 of 10 checks passed
deruyter92 added a commit to deruyter92/DeepLabCut-live that referenced this pull request Jan 21, 2026
This commit updates the PAF predictor to follow the DeepLabCut implementation in version 3.0.0.rc13. See
DeepLabCut/DeepLabCut#3117
MMathisLab pushed a commit to DeepLabCut/DeepLabCut-live that referenced this pull request Jan 22, 2026
* DEKRPredictor: add non-maximum suppression (NMS)

This commit Updates the DEKR predictor to follow the DeepLabCut implementation in version 3.0.0rc7, see
DeepLabCut/DeepLabCut#2907

* DEKRPredictor: speed up with vectorized operations

This commit updates the DEKRPredictor to follow the DeepLabCut implementation in version 3.0.0rc13.  see DeepLabCut/DeepLabCut#3121

* PartAffinityFieldPredictor (PAF): Speed up cost computation

This commit updates the PAF predictor to follow the DeepLabCut implementation in version 3.0.0.rc13. See
DeepLabCut/DeepLabCut#3117

* HeatmapPredictor (single animal): speed up with vecorized operations

This commit updates the `HeatmapPredictor` in single_predictor.py to follow the implementation in DeepLabCut 3.0.0rc13. See DeepLabCut/DeepLabCut#3110
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants