VirTex

This repo contains a replication of the CVPR 2021 paper VirTeX: Learning Visual Representations from Textual Annotations (https://arxiv.org/abs/2010.07734arxiv.org/abs/2006.06666). To start, please install all the packages specified in requirements.txt. Also you would need the COCO Caption Dataset (https://cocodataset.org/#download) and VOC2007 dataset (http://host.robots.ox.ac.uk/pascal/VOC/).

Step 0: Generate the Vocabulary for the COCO Caption Dataset

cd data/
python generate_vocabulary.py --data {path to coco caption annotations}

Step 1: Train a visual representation using the bidirectional captioning model

python train.py 
--save_dir {directory_to_save_the_model} \
--wandb_name {name_for_wandb_trakcing} \
--data_dir {where_to_find_the_COCO_Caption_Dataset} \
--max_steps 500000 \
--batch_size 64 \
--visual_lr 0.2 \
--textual_lr 0.001 \
--warmup_steps 10000 \
--gradient_clip_val 10.0 \
--checkpoint_freq 5000 \
--val_check_interval 1000 \
--seed 1 \

Step 3: Linear Probing on VOC2007

python downstream_VOC2007.py --model_path {checkpoint_file} --data_dir {where_to_find_VOC2007}

Notes

This repo is tested on 4 NVidia A6000 gpus.
On VOC2007, our replication is able to achieve mAP of 88.0 (reported results in the table 1 of the paper is 88.7)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
models		models
.gitignore		.gitignore
README.md		README.md
downstream_VOC2007.py		downstream_VOC2007.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VirTex

Step 0: Generate the Vocabulary for the COCO Caption Dataset

Step 1: Train a visual representation using the bidirectional captioning model

Step 3: Linear Probing on VOC2007

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VirTex

Step 0: Generate the Vocabulary for the COCO Caption Dataset

Step 1: Train a visual representation using the bidirectional captioning model

Step 3: Linear Probing on VOC2007

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages