ClusterPhotos is a 3D Slicer module for automatically organizing large collections of photographs into similar groups using AI-powered image clustering. It helps prepare images for the PhotoMasking module by grouping photos with similar content into subfolders.
- Overview
- When to Use ClusterPhotos
- How It Works
- Getting Started
- Parameters
- Workflow
- Understanding the Results
- Next Steps
ClusterPhotos uses a Vision Transformer (ViT) model to analyze and group photographs based on visual similarity. When you have a large collection of photographs taken from various angles, ClusterPhotos can automatically organize them into subfolders containing images with similar content or viewpoints.
- Load and analyze large image collections using deep learning embeddings.
- Automatically cluster images into groups based on visual similarity.
- Visualize clusters using UMAP dimensionality reduction in an interactive plot.
- Copy clustered images into organized subfolders ready for PhotoMasking.
ClusterPhotos is particularly useful when:
- You have hundreds of photographs taken from many different angles.
- Photos were taken in a continuous sequence without organized grouping.
- You want to organize images by viewing angle (e.g., top views, side views, bottom views).
- You need to prepare images for batch masking in PhotoMasking, which works best with images showing similar orientations.
You photographed a specimen from all angles, resulting in 300 images. Instead of manually sorting them into groups, ClusterPhotos can:
- Analyze all images using AI
- Identify natural groupings (e.g., "top view", "left side", "bottom view")
- Organize them into subfolders for efficient batch masking
ClusterPhotos uses a multi-step process:
- Feature Extraction: A Vision Transformer model (google/vit-large-patch16-224) creates a high-dimensional embedding for each image.
- Graph Construction: A k-nearest neighbors graph connects similar images.
- Spectral Clustering: Recursive spectral clustering divides images into groups.
- Visualization: UMAP reduces the high-dimensional embeddings to 2D for visualization.
ClusterPhotos will automatically install required dependencies:
- transformers (HuggingFace)
- umap-learn
- scikit-learn
- plotly
- Open 3D Slicer.
- Navigate to Modules → SlicerMorph → Photogrammetry → ClusterPhotos.
Select the directory containing your images. Supported formats include .jpg, .jpeg, .png, .bmp, and .tif.
The HuggingFace model used for image embeddings. Default is google/vit-large-patch16-224. This is a large Vision Transformer that provides high-quality image representations.
Number of neighbors for the k-NN graph construction. Higher values create denser connections between images.
- Default: 10
- Range: 2-200
- Tip: Increase for larger image sets
Maximum number of eigenvectors used in spectral clustering. Controls the granularity of cluster detection.
- Default: 15
- Range: 2-100
Maximum number of images allowed in each final cluster. Smaller values create more, finer-grained clusters.
- Default: 40
- Range: 2-2000
- Tip: Set based on your typical image set size for masking
Number of neighbors for UMAP dimensionality reduction. Affects how local vs. global structure is preserved in visualization.
- Default: 10
- Range: 2-200
Minimum distance between points in UMAP visualization. Lower values allow tighter clustering of similar points.
- Default: 0.1
- Range: 0.0-1.0
- Set the Model name (or use the default).
- Click Load Model.
- Wait for the model to download (first time only) and load.
- Select your Image folder containing the photographs.
- Click Load Images & Embed.
- A progress bar shows the embedding process.
- This may take several minutes for large image sets.
- Adjust clustering parameters if needed.
- Click Cluster & Plot.
- An interactive UMAP visualization appears showing:
- Each point represents one image
- Colors indicate cluster assignments
- Hover over points to see image filenames
- The Cluster Summary table shows:
- Cluster ID
- Number of images in each cluster
- Representative image names
- Review the visualization to ensure clusters make sense.
- Adjust parameters and re-cluster if needed.
- Click Copy Images to Subfolders.
- Choose an output directory.
- Images will be copied into subfolders named
Cluster_0,Cluster_1, etc. - The output directory structure is ready for PhotoMasking.
The 2D scatter plot shows:
- Nearby points: Images that are visually similar
- Distinct clusters: Groups of images with shared characteristics
- Colors: Different clusters are shown in different colors
Good clustering should:
- Group images with similar viewing angles together
- Separate clearly different viewpoints
- Create reasonably sized groups (not too large, not too small)
If clusters don't look right:
- Too few clusters: Decrease Max Cluster Size
- Too many clusters: Increase Max Cluster Size
- Clusters too mixed: Increase k-neighbors or Max Eigenvectors
- Clusters too fragmented: Decrease k-neighbors
After clustering your images:
- The output folder contains subfolders with grouped images.
- Open the PhotoMasking module.
- Set the Input Folder to your ClusterPhotos output directory.
- Each cluster subfolder becomes an "image set" in PhotoMasking.
- Use batch masking to efficiently mask each set.
- Consistent lighting: Helps the model identify similar viewpoints.
- Clear subject: Images with prominent, centered subjects cluster better.
- Sufficient variety: At least 50-100 images work well for clustering.
- Start with defaults: They work well for typical photogrammetry datasets.
- Iterate: Run clustering, review, adjust, repeat.
- Balance cluster sizes: Aim for 20-50 images per cluster for efficient masking.
- First run is slower: Model download and initial embedding take time.
- GPU acceleration: Embedding is faster with a CUDA-capable GPU.
- Large datasets: Consider processing in batches if you have thousands of images.
After running ClusterPhotos on a folder with 200 images:
Output_Folder/
├── Cluster_0/ (45 images - top views)
│ ├── IMG_001.jpg
│ ├── IMG_015.jpg
│ └── ...
├── Cluster_1/ (38 images - front views)
│ ├── IMG_003.jpg
│ ├── IMG_022.jpg
│ └── ...
├── Cluster_2/ (42 images - side views)
│ └── ...
├── Cluster_3/ (40 images - back views)
│ └── ...
└── Cluster_4/ (35 images - bottom views)
└── ...
This structure is directly usable as input for PhotoMasking.