SynCo-v2: An Empirical Study of Training Self-Supervised Vision Transformers with Synthetic Hard Negatives
This is a PyTorch implementation of the SynCo-v2 paper, currently available at giakoumoglou.com:
@misc{giakoumoglou2025syncov2,
title={{SynCo-v2: An Empirical Study of Training Self-Supervised Vision Transformers with Synthetic Hard Negatives}},
author={Nikolaos Giakoumoglou and Andreas Floros and Kleanthis Marios Papadopoulos and Tania Stathaki},
year={2026}
}
It also contains the implementation of BYOL and MoBY.
Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.
This repo is based on MoBY and SynCo code:
diff main_pretrain.py <(curl https://raw.githubusercontent.com/SwinTransformer/TransformerSSL/moby_main.py)
diff main_linear.py <(curl https://raw.githubusercontent.com/SwinTransformer/TransformerSSL/moby_linear.py)The scripts expect the following dataset structure:
[your imagenet-folder with train and val folders]/
├── train/
│ ├── class1/
│ ├── class2/
│ └── ...
└── val/
├── class1/
├── class2/
└── ...
To set up a compatible environment (CUDA 11.7/11.8) on PBS, follow these steps:
conda create -n syncov2 -c conda-forge cudatoolkit=11.8 python=3.10.11
conda activate syncov2
conda install -c "nvidia/label/cuda-11.8.0" cuda-nvcc
python3 -m pip install nvidia-cudnn-cu11==8.6.0.163
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install timm==0.4.9 diffdist Pillow pyyaml yacs termcolor scipy numpy==1.21.5 opencv-python tqdmTo do unsupervised pre-training of a ViT-Base model on ImageNet in an 8-gpu machine, run:
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=12345 \
main_pretrain.py \
--cfg configs/synco_vit_base.yaml \
--data-path [your imagenet-folder with train and val folders] \
--batch-size 64 \
--output [output folder] \
--tag [tag folder]To use MoBY or BYOL instead, swap the config file accordingly. For different architectures (Swin-Tiny, Swin-Small, Swin-Base, ViT-Small, ViT-Base), select the corresponding config from ./configs.
With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:
python -m torch.distributed.launch \
--nproc_per_node=8 \
--master_port=12345 \
eval_linear.py \
--cfg configs/synco_vit_base.yaml \
--data-path [your imagenet-folder with train and val folders] \
--output [output folder] \
--tag [tag folder]Use the same --cfg, --output, and --tag as in pre-training. By default, this performs linear probing with frozen features (LINEAR_EVAL.WEIGHTS freeze). For full fine-tuning, set LINEAR_EVAL.WEIGHTS finetune in the config.
To evaluate on downstream datasets (CIFAR-10, CIFAR-100, STL-10, Oxford Flowers102, Oxford Pets, Food101, Stanford Cars, Caltech101, DTD, FGVC Aircraft, SUN397, VOC2007, Places365), append --opts DATA.DATASET <dataset> to the linear evaluation command. For full fine-tuning, add --opts LINEAR_EVAL.WEIGHTS finetune.
Use the evaluation scripts from the official DINO repository.
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
This codebase is built upon MoBY, SynCo, DINO, Swin Transformer. We thank the authors for their excellent work and for making their code publicly available.