VideoMasking.md

VideoMasking User Guide

VideoMasking is a 3D Slicer module for extracting and masking frames from video using SAMURAI (Segment Anything Model for Universal and Robust AI tracking). It converts video recordings into masked individual frames that can be used for photogrammetry reconstruction with the ODM module.

Overview

VideoMasking enables you to create masked frames from video recordings of objects. Instead of taking individual photographs, you can record a video of your specimen rotating on a turntable and let VideoMasking:

Extract all frames from the video automatically
Track the object across all frames using SAMURAI
Filter similar frames using a similarity index to reduce redundancy
Generate masked frames ready for photogrammetry

What VideoMasking Allows You To Do

Convert video formats (MOV to MP4) if needed.
Automatically extract all frames from video (up to 2000 frames maximum).
Select a Region of Interest (ROI) on the first frame to identify your object.
Automatically track and mask the object across all frames using SAMURAI.
Filter similar frames based on visual similarity to reduce redundant frames.
Output masked frames ready for reconstruction with the ODM module.

What is SAMURAI?

SAMURAI (Segment Anything Model for Universal and Robust AI tracking) is a state-of-the-art video object segmentation model. It extends the Segment Anything Model (SAM) with motion-aware memory capabilities, allowing it to:

Track objects across video frames with zero-shot learning
Handle occlusions and object deformations
Maintain consistent segmentation throughout the video

SAMURAI is particularly effective for photogrammetry workflows where you need consistent object masking across many frames captured from different angles.

Prerequisites

VideoMasking requires:

GPU with CUDA support: SAMURAI benefits significantly from GPU acceleration
PyTorch with CUDA: Installed automatically via Slicer's PyTorchUtils
Sufficient disk space: For video conversion and frame extraction

Note: If you're using MorphoCloud On Demand, all prerequisites are already configured.

SAMURAI Setup

Before using VideoMasking, you need to set up the SAMURAI repository:

Open the VideoMasking module in 3D Slicer.
Expand the SAMURAI Setup collapsible section.
Click Clone SAMURAI to download the SAMURAI repository.
- This clones the SlicerMorph SAMURAI fork into the module's Support directory.
Wait for the setup to complete. The module will download model checkpoints automatically when needed.

Checkpoint Selection

Select the appropriate checkpoint for your use case.
Choose your device (CUDA for GPU acceleration, or CPU as fallback).

Video Preparation

Output Format Selection

Before loading your video, choose your preferred image format for extracted frames:

PNG: Lossless compression, larger files (if you opt to use compressed PNG, beware that workflow can be significantly slower)
JPG: Smaller file sizes, slight quality loss

This setting applies to extracted frames, masks, and all saved outputs.

Video Requirements

VideoMasking has specific frame limits to ensure memory-efficient processing:

Maximum frames: 2000 frames total (~33 seconds at 60fps)
Automatic chunking: Videos with more than 600 frames are automatically split into smaller chunks for memory safety
All frames extracted: The module extracts every frame from your video (no frame interval selection)

If your video exceeds 2000 frames, you'll need to trim it before processing.

Video Format Conversion

If your video is in MOV format (common from iPhone/camera recordings), conversion is handled automatically:

Expand the Video Prep collapsible section.
Select your input video file using the Video File selector.
The module will automatically convert MOV to MP4 when you click Load Video.

Frame Extraction

Frame extraction happens automatically when you load a video:

Select your video file.
Set the Frames Directory where extracted frames will be saved.
Click Load Video to begin preparation.
The module will:
- Convert MOV to MP4 if needed
- Validate the frame count (must be ≤2000 frames)
- For videos >600 frames: Split into chunks and extract only the first frame initially (for ROI setup)
- For videos ≤600 frames: Extract all frames immediately
Wait for extraction to complete.

Note: All frames are extracted automatically - there is no frame interval or "every Nth frame" setting. The module extracts every single frame from your video.

ROI Selection and Tracking

After video preparation:

Expand the ROI & Tracking collapsible section.
Click Load Frames to load the extracted frames into the viewer.
The first frame will be displayed in the Red slice viewer.

Selecting the ROI

Click Select ROI on First Frame.
Draw a bounding box around your object in the first frame:
- Click and drag to create a rectangle that encompasses the entire object.
- Make sure the box includes the complete object with a small margin. Try to reduce the amount of background in the ROI.
Review your selection - this ROI will be used to initialize tracking.

Running Tracking

Once satisfied with the ROI, click Finalize ROI & Run Tracking.
SAMURAI will process all frames:
- The model uses the ROI to identify the object in frame 1.
- It then tracks and segments the object through all subsequent frames.
- For chunked videos, each chunk is processed sequentially to manage memory.
- Progress is shown in the log panel.
When complete, masks will be generated for all frames.

Key-Frame Filtering

After tracking is complete, you can reduce the number of frames using similarity-based filtering. This is important because consecutive video frames are often very similar, and having too many similar frames can slow down or degrade photogrammetry reconstruction.

How Filtering Works

The filtering algorithm:

Compares each masked frame to the previously kept frame
Calculates visual dissimilarity based on the masked region only
Keeps frames that are sufficiently different from the last kept frame
Always keeps the first frame as a starting point

Using the Filter

After tracking completes, locate the Key-Frame Filtering section.
Adjust the Similarity Threshold slider:
- Higher values (e.g., 0.90): Keep more frames (frames must be very similar to be removed)
- Lower values (e.g., 0.70): Keep fewer frames (more aggressive filtering)
- Default: Start around 0.80-0.85
Click Filter Key Frames.
The module will report how many frames were kept (e.g., "Kept 85/300 frames").

Tip: For photogrammetry, you typically want 150-300 final frames with good coverage of all viewing angles. Start with the suggested default threshold and adjust if needed.

Saving Output

Saving Frames

Expand the Save section.
Select an output folder using Browse.
Click Save Outputs.
The module saves:
- original/Set1/: Original (unmasked) frames
- masked/Set1/: Masked frames and binary mask files (_mask suffix)

If you ran key-frame filtering, only the filtered frames are saved. Otherwise, all frames are saved.

EXIF Metadata

The module automatically embeds camera metadata (extracted from the video) into saved images, which helps photogrammetry software estimate camera parameters.

Next Steps

Once VideoMasking has generated your masked frames:

Note the output directory containing your masked frames (the masked/ subfolder).
Open the ODM module.
Set the Masked Images Folder to the masked/ folder from VideoMasking output.
Configure and run the reconstruction task.

Tips for Best Results

Video Recording

Use a turntable: Place your object on a rotating platform for consistent coverage.
Steady camera: Use a tripod to minimize camera shake.
Good lighting: Ensure even, diffuse lighting without harsh shadows.
Plain background: A solid, contrasting background helps with segmentation.
Slow rotation: 20-30 seconds for a full rotation at 60fps gives ~1200-1800 frames.
Keep within limits: Videos must be ≤2000 frames (~33 seconds at 60fps).

ROI Selection

Include the entire object: The initial ROI should fully contain the object.
Add margin: A small margin around the object helps with tracking.
Avoid background clutter: If possible, position the object away from similar-colored backgrounds.

Filtering

Start with defaults: The 0.80 similarity threshold works well for most videos.
Check coverage: After filtering, ensure you still have frames from all viewing angles.
Re-filter if needed: You can adjust the threshold and re-run filtering.

Troubleshooting

Video Too Long Error

Trim your video to ≤2000 frames (~33 seconds at 60fps)
Use video editing software or ffmpeg to cut the video

Tracking Loses the Object

Try a larger initial ROI
Ensure the object doesn't leave the frame during the video
Check that the object is clearly visible and contrasts with background

Out of Memory Errors

The module automatically chunks long videos to prevent this
If errors persist, try a shorter video
Close other GPU-intensive applications

Slow Processing

Ensure CUDA is being used (check device selection)
Processing time depends on video length and resolution
Consider using MorphoCloud for faster GPU access

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VideoMasking User Guide

Table of Contents

Overview

What VideoMasking Allows You To Do

What is SAMURAI?

Prerequisites

SAMURAI Setup

Checkpoint Selection

Video Preparation

Output Format Selection

Video Requirements

Video Format Conversion

Frame Extraction

ROI Selection and Tracking

Selecting the ROI

Running Tracking

Key-Frame Filtering

How Filtering Works

Using the Filter

Saving Output

Saving Frames

EXIF Metadata

Next Steps

Tips for Best Results

Video Recording

ROI Selection

Filtering

Troubleshooting

Video Too Long Error

Tracking Loses the Object

Out of Memory Errors

Slow Processing

FilesExpand file tree

VideoMasking.md

Latest commit

History

VideoMasking.md

File metadata and controls

VideoMasking User Guide

Table of Contents

Overview

What VideoMasking Allows You To Do

What is SAMURAI?

Prerequisites

SAMURAI Setup

Checkpoint Selection

Video Preparation

Output Format Selection

Video Requirements

Video Format Conversion

Frame Extraction

ROI Selection and Tracking

Selecting the ROI

Running Tracking

Key-Frame Filtering

How Filtering Works

Using the Filter

Saving Output

Saving Frames

EXIF Metadata

Next Steps

Tips for Best Results

Video Recording

ROI Selection

Filtering

Troubleshooting

Video Too Long Error

Tracking Loses the Object

Out of Memory Errors

Slow Processing