Refactor list_videos_in_folder for clarity and correctness#3175
Refactor list_videos_in_folder for clarity and correctness#3175MMathisLab merged 10 commits intoDeepLabCut:mainfrom
Conversation
- Case handling: extensions are normalized to lowercase for matching - Raise FileNotFoundError instead of assertion error - Updated docstrings
deruyter92
left a comment
There was a problem hiding this comment.
Thanks for addressing this! Looks good.
Since we are adjusting this function now anyway, I pushed some additional refactors.
- added case normalization (match lowercases)
- added deduplication of paths
- changed assertion to FileNotFoundError.
There was a problem hiding this comment.
Pull request overview
This PR refactors the list_videos_in_folder function to improve performance, fix bugs, and enhance code clarity. The changes optimize suffix matching using sets instead of lists, consolidate normalization logic, and improve error handling.
Changes:
- Optimized video suffix matching by using sets for O(1) lookups instead of lists
- Consolidated suffix normalization logic and moved it outside the loop
- Enhanced error handling with earlier path existence checks and proper exception raising
- Fixed shuffle behavior to apply to the complete video list rather than per-directory
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| videos.append(path) | ||
|
|
||
| # Remove duplicates while preserving order (using resolved paths to handle symlinks) | ||
| videos = list({v.resolve(): v for v in videos}.values()) |
There was a problem hiding this comment.
Using dictionary comprehension for deduplication may not preserve the original order reliably in all Python versions, and using v.resolve() as keys could have unexpected behavior if symbolic links point to the same file. Consider using a more explicit approach with a seen set to track resolved paths while preserving insertion order.
| videos = list({v.resolve(): v for v in videos}.values()) | |
| seen = set() | |
| unique_videos: list[Path] = [] | |
| for v in videos: | |
| resolved = v.resolve() | |
| if resolved in seen: | |
| continue | |
| seen.add(resolved) | |
| unique_videos.append(v) | |
| videos = unique_videos |
|
@juan-cobos thanks! 3c3d0c4 is exactly the opposite from da1ccd7 right? Is that on purpose? (see copilot suggestion) The rest of the code looks good to me. |
|
@deruyter92 I understand removing duplicates, although I’m not sure how likely that case is (it wasn’t checked before). What I don’t quite follow is keeping unresolved paths, since it's is effectively arbitrary (you'd just keep the first one). |
|
agreed, not worried about the order and path.resolve raising potential errors is actually preferred behavior. thanks for changing it back. Looks all good to me! |
Refactors
list_videos_in_folderto improve code clarity, fix bugs, and streamline logic:Key changes:
video_suffixesfrom list to set for more efficient suffix matchinglstrip('.')instead of conditional checksvideo_suffixesbefore being added to resultspath.exists()check before path type determination to catch errors earlierPathtodata_pathtype annotation for better type safetyThese changes reduce code duplication, improve performance with set-based lookups, and ensure only valid video files are returned.