PyTorch: Improve inference batching speed by arashsm79 · Pull Request #3099 · DeepLabCut/DeepLabCut

arashsm79 · 2025-09-17T13:27:12Z

Summary

This PR replaces incremental tensor concatenation ( $O(n^2)$ ) during inference with a list-based accumulation. Final stacking now happens only when forming a batch, avoiding repeated reallocation and copy.

(Depends on #3094)

Main points:

Appending images is now $O(1)$ amortized.
Single torch.stack per processed batch.
Reduced peak allocator churn and CPU overhead.
No public API changes.

Details

The previous pattern for batching images during inference:

self._batch = torch.cat([self._batch, inputs], dim=0)

caused $O(n^2)$ total memory movement for $n$ appended images. This was a bottleneck for larger batches, causing allocator churn and CPU overhead.

Profiling

This was confirmed by benchmarking the inference procedure with Torch Profile and Scalene.

Torch Profile
We can see for large batch sizes, the GPU is stalled and waits for the producer CPU thread to finish preprocessing the images.

Scalene
Statistical analysis shows that the concatenation operation is one of the main hot spots of the inference procedure due to reallocation and copying of the immutable tensor:

Results

We can see that these changes fix the problem with large batch sizes being inefficient (thanks @maximpavliv for running the benchmark):

And the batching pipeline is no longer listed in Scalene results:

AlexEMG · 2025-09-18T08:54:38Z

This is excellent @arashsm79 -- thanks for the contribution!

maximpavliv

Thanks for fixing the batching mechanism! 🚀
The fix significantly improves performance for larger batch sizes, which is a big win.

Not directly related to this PR, but I realized that the CTDInferenceRunner is lacking the multithreading scheme (preprocessing and batching preformed by a producer thread, prediction preformed by a consumer thread). Let's address this in a future PR.

MMathisLab · 2025-09-18T12:45:57Z

Looks great, let's fix merge conflicts and the merge this @arashsm79

AlexEMG

Fantastic @arashsm79 -- let's fix the conflicts and we're ready for rc13!

Use list accumulation for inference batches to eliminate O(n^2) torch.cat - Replaced incremental tensor _batch with list _batch_list - Stack only at batch processing time - Updated sequential and async inference paths

maximpavliv self-requested a review September 17, 2025 14:17

arashsm79 marked this pull request as ready for review September 17, 2025 21:40

arashsm79 changed the title ~~[WIP] Improve inference speed~~ Improve inference speed Sep 17, 2025

maximpavliv approved these changes Sep 18, 2025

View reviewed changes

maximpavliv mentioned this pull request Sep 18, 2025

Add optional torch.compile support to InferenceRunner via InferenceConfig #3098

Closed

AlexEMG approved these changes Sep 18, 2025

View reviewed changes

arashsm79 changed the title ~~Improve inference speed~~ Improve inference batching speed Sep 19, 2025

Speed up inference by better batching

61c5d68

Use list accumulation for inference batches to eliminate O(n^2) torch.cat - Replaced incremental tensor _batch with list _batch_list - Stack only at batch processing time - Updated sequential and async inference paths

arashsm79 force-pushed the arash/improve_inference_batch_collation branch from 85402e3 to 61c5d68 Compare September 19, 2025 09:48

AlexEMG changed the title ~~Improve inference batching speed~~ PyTorch: Improve inference batching speed Sep 19, 2025

maximpavliv added pytorch DLC3.0🔥 speed-checking labels Sep 19, 2025

AlexEMG merged commit 5815229 into DeepLabCut:main Sep 21, 2025
4 checks passed

maximpavliv mentioned this pull request Oct 2, 2025

Inference speed decreasing with batch size (PyTorch) #2845

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch: Improve inference batching speed#3099

PyTorch: Improve inference batching speed#3099
AlexEMG merged 1 commit intoDeepLabCut:mainfrom
arashsm79:arash/improve_inference_batch_collation

arashsm79 commented Sep 17, 2025 •

edited

Loading

Uh oh!

AlexEMG commented Sep 18, 2025

Uh oh!

maximpavliv left a comment •

edited

Loading

Uh oh!

MMathisLab commented Sep 18, 2025

Uh oh!

AlexEMG left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

arashsm79 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Profiling

Results

Uh oh!

AlexEMG commented Sep 18, 2025

Uh oh!

maximpavliv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MMathisLab commented Sep 18, 2025

Uh oh!

AlexEMG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arashsm79 commented Sep 17, 2025 •

edited

Loading

maximpavliv left a comment •

edited

Loading