Skip to content

Refactor list_videos_in_folder for clarity and correctness#3175

Merged
MMathisLab merged 10 commits intoDeepLabCut:mainfrom
juan-cobos:refactor/list_videos_in_folder
Jan 17, 2026
Merged

Refactor list_videos_in_folder for clarity and correctness#3175
MMathisLab merged 10 commits intoDeepLabCut:mainfrom
juan-cobos:refactor/list_videos_in_folder

Conversation

@juan-cobos
Copy link
Copy Markdown
Contributor

Refactors list_videos_in_folder to improve code clarity, fix bugs, and streamline logic:

Key changes:

  • Moved suffix normalization out of loop: Video suffixes are now computed once before iteration instead of being recalculated for each directory
  • Replaced list with set for O(1) lookups: Changed video_suffixes from list to set for more efficient suffix matching
  • Simplified suffix normalization: Consolidated duplicate normalization logic using lstrip('.') instead of conditional checks
  • Added file validation for non-directory paths: Individual files are now validated against video_suffixes before being added to results
  • Improved error handling: Moved path.exists() check before path type determination to catch errors earlier
  • Fixed shuffle behavior: Shuffle now applies to the complete video list instead of per-directory, ensuring consistent behavior
  • Enhanced type hints: Added Path to data_path type annotation for better type safety

These changes reduce code duplication, improve performance with set-based lookups, and ensure only valid video files are returned.

@MMathisLab MMathisLab requested a review from deruyter92 January 14, 2026 15:07
@MMathisLab MMathisLab added the enhancement New feature or request label Jan 14, 2026
- Case handling: extensions are normalized to lowercase for matching
- Raise FileNotFoundError instead of assertion error
- Updated docstrings
Copy link
Copy Markdown
Collaborator

@deruyter92 deruyter92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this! Looks good.

Since we are adjusting this function now anyway, I pushed some additional refactors.

  • added case normalization (match lowercases)
  • added deduplication of paths
  • changed assertion to FileNotFoundError.

@deruyter92 deruyter92 requested a review from Copilot January 15, 2026 11:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the list_videos_in_folder function to improve performance, fix bugs, and enhance code clarity. The changes optimize suffix matching using sets instead of lists, consolidate normalization logic, and improve error handling.

Changes:

  • Optimized video suffix matching by using sets for O(1) lookups instead of lists
  • Consolidated suffix normalization logic and moved it outside the loop
  • Enhanced error handling with earlier path existence checks and proper exception raising
  • Fixed shuffle behavior to apply to the complete video list rather than per-directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

videos.append(path)

# Remove duplicates while preserving order (using resolved paths to handle symlinks)
videos = list({v.resolve(): v for v in videos}.values())
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using dictionary comprehension for deduplication may not preserve the original order reliably in all Python versions, and using v.resolve() as keys could have unexpected behavior if symbolic links point to the same file. Consider using a more explicit approach with a seen set to track resolved paths while preserving insertion order.

Suggested change
videos = list({v.resolve(): v for v in videos}.values())
seen = set()
unique_videos: list[Path] = []
for v in videos:
resolved = v.resolve()
if resolved in seen:
continue
seen.add(resolved)
unique_videos.append(v)
videos = unique_videos

Copilot uses AI. Check for mistakes.
@deruyter92
Copy link
Copy Markdown
Collaborator

@juan-cobos thanks! 3c3d0c4 is exactly the opposite from da1ccd7 right? Is that on purpose? (see copilot suggestion)
Both solutions are fine with me.

The rest of the code looks good to me.

@juan-cobos
Copy link
Copy Markdown
Contributor Author

@deruyter92 I understand removing duplicates, although I’m not sure how likely that case is (it wasn’t checked before). What I don’t quite follow is keeping unresolved paths, since it's is effectively arbitrary (you'd just keep the first one).

@deruyter92
Copy link
Copy Markdown
Collaborator

agreed, not worried about the order and path.resolve raising potential errors is actually preferred behavior.

thanks for changing it back. Looks all good to me!

@deruyter92 deruyter92 requested a review from MMathisLab January 16, 2026 07:46
@MMathisLab MMathisLab merged commit 9d4771c into DeepLabCut:main Jan 17, 2026
10 checks passed
@juan-cobos juan-cobos deleted the refactor/list_videos_in_folder branch January 17, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants