PyTorch: Speed up PAF cost computation by arashsm79 · Pull Request #3117 · DeepLabCut/DeepLabCut

arashsm79 · 2025-10-09T15:27:10Z

Summary

Improve PAF performance by performing affinity computation on the GPU with advanced indexing.

Affinities are now calculated using torch operations.
The cost per batch dictionary is created more efficiently.

Details

This implementation tries to delegate the parts that can be parallelized to the GPU by using torch operations instead of numpy ones. (thanks to @maximpavliv for running the benchmark)

The figure below shows the parts of the execution that can be optimized.
The part outlined by the red rectangle concerning compute_peaks_and_costs is now as optimized as I could make it.
The blue rectangle is concerned with the assembly procedure which did not get into in this PR. There is a lot of room for optimization in there as well, which may require a lot of refactoring/changes.

Improve PAF performance by performing affinity computation on the GPU with advanced indexing. - Affinities are now calculated using torch operations. - The cost per batch dictionary is created more efficiently.

maximpavliv

Good job! ✅

Code is clearer, docstrings are much more detailed, and variable naming is improved (batch_size, paf_limb_inds).
GPU usage is now more efficient, avoiding unnecessary early CPU transfers → speed is improved.
The inference results slighly differ from expected results.

This commit updates the PAF predictor to follow the DeepLabCut implementation in version 3.0.0.rc13. See DeepLabCut/DeepLabCut#3117

* DEKRPredictor: add non-maximum suppression (NMS) This commit Updates the DEKR predictor to follow the DeepLabCut implementation in version 3.0.0rc7, see DeepLabCut/DeepLabCut#2907 * DEKRPredictor: speed up with vectorized operations This commit updates the DEKRPredictor to follow the DeepLabCut implementation in version 3.0.0rc13. see DeepLabCut/DeepLabCut#3121 * PartAffinityFieldPredictor (PAF): Speed up cost computation This commit updates the PAF predictor to follow the DeepLabCut implementation in version 3.0.0.rc13. See DeepLabCut/DeepLabCut#3117 * HeatmapPredictor (single animal): speed up with vecorized operations This commit updates the `HeatmapPredictor` in single_predictor.py to follow the implementation in DeepLabCut 3.0.0rc13. See DeepLabCut/DeepLabCut#3110

arashsm79 added 2 commits October 9, 2025 15:16

Speedup PAF cost computation

d881260

Improve PAF performance by performing affinity computation on the GPU with advanced indexing. - Affinities are now calculated using torch operations. - The cost per batch dictionary is created more efficiently.

Change to per batch computaiton to reduce memory

4735171

arashsm79 changed the title ~~PyTorch: Speedup PAF cost computation~~ PyTorch: Speed up PAF cost computation Oct 10, 2025

arashsm79 added 2 commits October 14, 2025 11:32

black formatting

6ed5c55

Use integration and precision closer to numpy version

89c70ae

arashsm79 marked this pull request as ready for review October 14, 2025 15:32

maximpavliv self-requested a review October 14, 2025 15:33

maximpavliv approved these changes Oct 14, 2025

View reviewed changes

arashsm79 and others added 2 commits October 18, 2025 18:37

ci: trigger checks

bff0dad

Merge branch 'main' into arash/speedup_paf

0d05874

AlexEMG approved these changes Oct 24, 2025

View reviewed changes

AlexEMG merged commit d5641bf into DeepLabCut:main Oct 24, 2025
9 of 10 checks passed

deruyter92 mentioned this pull request Jan 21, 2026

update pytorch models following DeepLabCut 3.0.0rc13 DeepLabCut/DeepLabCut-live#151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch: Speed up PAF cost computation#3117

PyTorch: Speed up PAF cost computation#3117
AlexEMG merged 6 commits intoDeepLabCut:mainfrom
arashsm79:arash/speedup_paf

arashsm79 commented Oct 9, 2025 •

edited

Loading

Uh oh!

maximpavliv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

arashsm79 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Uh oh!

maximpavliv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arashsm79 commented Oct 9, 2025 •

edited

Loading