Official implementation of "Towards Explainable Video Camouflaged Object Detection: SAM2 with Eventstream-Inspired Data" (AAAI 2026).
EventVCOD introduces a novel framework for Video Camouflaged Object Detection (VCOD) by leveraging event camera-inspired data and Segment Anything Model 2 (SAM2). Our approach provides explainable detection through event-based motion representations that capture temporal dynamics invisible to standard RGB cameras.
- π― SAM2-based Architecture: Fine-tuned SAM2 with custom prompt generators for VCOD
- β‘ Event-Inspired Data: Novel eventstream-like representation from RGB videos
- π¬ Video Understanding: Temporal coherence through event polarity (+/-) encoding
- π Explainable Detection: Motion-based interpretable features
- π State-of-the-art Performance: Superior results on MoCA-Mask-Video and CAD-2016 benchmarks
Our framework consists of three main components:
- Event-Inspired Data Generation: Convert RGB frames to event-like representations with positive/negative polarities
- Prompt Generator: Dense embedding generator for SAM2 conditioning
- SAM2 Backbone: Fine-tuned image encoder with memory attention mechanism
- Python 3.8+
- CUDA 12.1+
- PyTorch 2.3.0+
# Clone the repository
git clone https://github.com/yourusername/EventVCOD.git
cd EventVCOD
# Install dependencies
pip install -r requirements.txt
# Install SAM2
pip install -e .Main dependencies include:
torch>=2.3.0torchvision>=0.18.0opencv-python>=4.8.0numpy>=1.24.2Pillow>=9.4.0tensorboardX>=2.6.2timm==0.4.12
See requirements.txt for complete list.
All datasets and pre-processed event-like data are available at:
- π Netdisk
Organize your datasets as follows:
datasets/
βββ MoCA-Video-Train_event/
β βββ crab/
β βββ flatfish_0/
β βββ ...
βββ MoCA-Video-Test_event/
β βββ arctic_fox/
β βββ black_cat_1/
β βββ ...
βββ CAD2016_event/
βββ ...
To generate event-like representations from your own RGB videos:
cd data_manipulate
python eventflow_like_gen_claude.py --input_dir /path/to/videos --output_dir /path/to/outputAll pre-trained models (including SAM2 checkpoints and fine-tuned EventVCOD models) are available at:
- π Netdisk
Download the checkpoints and place them in the checkpoints/ directory.
python train.py --config sam2/configs/sam2.1_training/sam2.1_hiera_b+_VCOD_finetune_tiny_adp0317_video_part2_SAM2_finetune30.yaml# 4 GPUs example
python -m torch.distributed.launch \
--nproc_per_node=4 \
--master_port=29500 \
train.py \
--config sam2/configs/sam2.1_training/sam2.1_hiera_b+_VCOD_finetune_tiny_adp0317_video_part2_SAM2_finetune30.yaml \
--name eventvcod_exp \
--tag experiment_v1Main training parameters in config files:
resolution: Input resolution (default: 1024)train_batch_size: Batch size per GPUnum_frames: Number of frames per video clipnum_epochs: Total training epochsbase_lr: Base learning rate
Modify configurations in sam2/configs/sam2.1_training/ for different settings.
python test.py \
--config sam2/configs/sam2.1/sam2.1_hiera_b+_VCOD_infer_modify.yaml \
--model /path/to/checkpoint.pthWe evaluate using standard VCOD metrics:
- S-measure (SΞ±): Structure similarity
- E-measure (EΟ): Enhanced alignment measure
- weighted F-measure (FwΞ²): Weighted precision-recall
- MAE (M): Mean Absolute Error
- mean Dice (mDice): Dice coefficient
- mean IoU (mIoU): Intersection over Union
For comprehensive evaluation using MATLAB:
# For MoCA-Mask-Video dataset
cd eval
run main_MoCA.m
# For CAD-2016 dataset
run main_CAD.mcd eval/PySODMetrics
python evaluate.py --pred_dir /path/to/predictions --gt_dir /path/to/ground_truth| Method | SΞ±β | FwΞ²β | EΟβ | Mβ | mDiceβ | mIoUβ |
|---|---|---|---|---|---|---|
| RCRNet | .597 | .174 | .583 | .025 | .194 | .137 |
| PNS-Net | .576 | .134 | .536 | .038 | .189 | .133 |
| MG | .547 | .165 | .537 | .095 | .197 | .137 |
| SLT-Net | .656 | .357 | .785 | .021 | .387 | .310 |
| IMEX | .661 | .371 | .778 | .020 | .409 | .319 |
| TSP-SAM(M+P) | .673 | .400 | .766 | .012 | .421 | .345 |
| TSP-SAM(M+B) | .689 | .444 | .808 | .008 | .458 | .388 |
| ZoomNeXt(T=1) | .690 | .395 | .702 | .017 | .420 | .353 |
| ZoomNeXt(T=5) | .734 | .476 | .736 | .010 | .497 | .422 |
| EMIP | .669 | .374 | β | .017 | .424 | .326 |
| EMIP-L | .675 | .381 | β | .015 | .426 | .333 |
| EventVCOD (Ours) | .753 | .573 | .855 | .009 | .574 | .496 |
| Method | SΞ±β | FwΞ²β | EΟβ | Mβ | mDiceβ | mIoUβ |
|---|---|---|---|---|---|---|
| RCRNet | β | β | β | β | β | β |
| PNS-Net | .678 | .369 | .720 | .043 | .409 | .309 |
| MG | .613 | .370 | .537 | .070 | .351 | .260 |
| SLT-Net | .669 | .481 | .845 | .030 | .368 | .268 |
| IMEX | .684 | .452 | .813 | .033 | .469 | .370 |
| TSP-SAM(M+P) | .705 | .565 | .836 | .027 | .591 | .422 |
| TSP-SAM(M+B) | .751 | .628 | .865 | .021 | .603 | .496 |
| ZoomNeXt(T=1) | .721 | .525 | .759 | .024 | .523 | .436 |
| ZoomNeXt(T=5) | .757 | .593 | .865 | .020 | .509 | .510 |
| EMIP | .710 | .504 | β | .029 | .528 | .415 |
| EMIP-L | .719 | .514 | β | .028 | .536 | .425 |
| EventVCOD (Ours) | .802 | .717 | .887 | .023 | .717 | .615 |
Note: β/β indicates metrics not reported in original papers. Full quantitative results available in the paper.
EventVCOD/
βββ sam2/ # SAM2 core implementation
β βββ modeling/ # Model architectures
β βββ configs/ # Configuration files
β β βββ sam2.1_training/ # Training configs
β β βββ ablation/ # Ablation study configs
β βββ utils/ # Utility functions
βββ training/ # Training pipeline
β βββ trainer.py # Main trainer
β βββ trainer_supervision.py # Supervised training
β βββ model/ # Model definitions
βββ datasets/ # Dataset implementations
β βββ datasets.py # Dataset loaders
β βββ transform_custom.py # Data augmentation
βββ data_manipulate/ # Event data generation
β βββ eventflow_like_gen_claude.py # Event generation
β βββ eventflow_p_n_visualization.ipynb # Visualization
βββ eval/ # Evaluation scripts
β βββ PySODMetrics/ # Python metrics
β βββ *.m # MATLAB evaluation
βββ train.py # Training entry point
βββ test.py # Testing entry point
βββ utils.py # General utilities
Our event data generation process:
- Frame Difference: Compute temporal derivatives between consecutive frames
- Polarity Assignment: Threshold-based positive/negative event classification
- Accumulation: Aggregate events over time windows
- Normalization: Scale to appropriate intensity ranges
Example usage:
from data_manipulate.eventflow_like_gen_claude import generate_events
events_pos, events_neg = generate_events(
video_path='path/to/video.mp4',
threshold=0.2,
output_dir='path/to/output'
)See data_manipulate/eventflow_p_n_visualization.ipynb for visualization examples.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This work builds upon:
- SAM2 - Meta's Segment Anything Model 2
- π Paper PDF
- π Supplementary Material
- πΎ Datasets & Checkpoints (Netdisk)
Star β this repository if you find it helpful!
