Skip to content

Feature Request: Replace Albumentations with torchvision.transforms.v2 for PyTorch pose estimation #3240

@juan-cobos

Description

@juan-cobos

Is your feature request related to a problem? Please describe.

Albumentations is an external dependency that has slowed down its development (last major release 1.4.3 in late 2023).

Describe the solution you'd like

Replace the current Albumentations-based augmentation system in the PyTorch pose estimation module (deeplabcut/pose_estimation_pytorch/data/transforms.py) with torchvision.transforms.v2.
Benefits:

  • Dependency reduction: Fewer external dependencies
  • Active development: torchvision.transforms.v2 is actively maintained
  • Better integration: Native support for tv_tensors.KeyPoints, tv_tensors.BoundingBoxes
  • Performance: Optimized for PyTorch with potential GPU acceleration

Describe alternatives you've considered

  1. Keep Albumentations: Continue using it, but this relies on an external library with slower development
  2. Kornia: Another option for differentiable augmentations, but less focused on standard image transforms
  3. Custom implementation: Implement all transforms from scratch (too much work)

Additional context

Current implementation uses ~15+ Albumentations transforms:

  • Resize, LongestMaxSize, HorizontalFlip, Affine, PadIfNeeded
  • Equalize, MotionBlur, GaussNoise, CoarseDropout, ElasticTransform
  • Custom transforms: HFlip (keypoint-aware), KeypointAwareCrop, KeepAspectRatioResize, etc.
    Relevant files:
  • deeplabcut/pose_estimation_pytorch/data/transforms.py - main transforms implementation
  • deeplabcut/pose_estimation_pytorch/data/preprocessor.py
  • deeplabcut/pose_estimation_pytorch/data/dataset.py

I'd like to contribute to developing this feature but have some questions about the desired interface:

  1. Custom transforms vs. built-ins: Should we implement custom DLC-specific transforms (e.g., KeypointAwareCrop for cropping around annotated keypoints) using PyTorch, or adapt to use built-in v2 transforms?
  2. API compatibility: Should the config format remain the same (e.g., hflip, affine, crop_sampling), or take advantage of v2's declarative API?
  3. Backward compatibility: Should we maintain Albumentations as a fallback, or fully migrate to v2?

Thank you in advance!
Juan

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions