Skip to content

Shaokai/faster inference#3012

Merged
MMathisLab merged 6 commits intoDeepLabCut:mainfrom
yeshaokai:shaokai/faster_inference
Jun 23, 2025
Merged

Shaokai/faster inference#3012
MMathisLab merged 6 commits intoDeepLabCut:mainfrom
yeshaokai:shaokai/faster_inference

Conversation

@yeshaokai
Copy link
Copy Markdown
Collaborator

@yeshaokai yeshaokai commented Jun 20, 2025

  1. Added pre-fetching to inference runner.
  2. Replaced torch.no_grad() with torch.inference_mode() as it's better
  3. Added automatic mixed precision inference

Async mode by default True and num_prefetch_batches by default 4.

Empirically, batch size 16 for detector and batch size 32 work well. Therefore, I ran speed testing with changes introduced in this PR, using superanimal_video_inference and a 800x600 video :

resnet50_fasterrcnn + hrnet32
12.7 FPS -> 18 FPS

mobilenet_fasterrcnn + resnet50
25.4 FPS -> 31 FPS

ssdlite + rtmpose
25.7 FPS -> 33FPS

gpu memory usage is reduced with amp inference.

@MMathisLab MMathisLab merged commit e72b559 into DeepLabCut:main Jun 23, 2025
4 checks passed
Comment on lines -427 to +585
outputs = self.model(inputs.to(self.device), **kwargs)




Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a bug, outputs becomes undefined in the next line

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, lets write a test then @maximpavliv , as tests should not pass otherwise

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the variables are still alive after leaving the with scope. Is that what you have in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants