Improving generalization capability of object detectors via efficient global attention

Introduction

This repository contains the code for the paper "Improving Generalization Capability of Object Detectors via Efficient Attention".

Real-world applications are highly variable, and object detectors must handle various environmental changes, such as lighting, weather conditions, and viewpoint differences. Therefore, these models need strong generalization capability. According to Gestalt theory, human-level generalization emerges from global perception abilities. In this paper, we aim to enhance the generalization capability of YOLO series detectors by introducing global self-attention mechanisms. Specifically, to address the quadratic complexity issue of the vanilla self-attention module, we propose a new efficient attention module. Additionally, to enhance the model's feature extraction capability, we introduce a new local feature extraction module. Building on these two components, we construct a new backbone network that can be seamlessly integrated into the YOLO series detectors. Thanks to the global perception capability of the efficient global attention module and the strong learning ability of the local feature extraction module, our method achieved a +1.8 AP improvement on YOLOv8 and a +1.0 AP improvement on YOLOv10 in experiments on the COCO dataset, demonstrating the superior generalization capability. Ablation studies on the Pascal VOC dataset further validated the effectiveness and efficiency of each design component. Finally, experiments on the Diverse-Weather dataset confirmed the robust generalization capability of our method, which significantly outperforms state-of-the-art methods.

Usage

Our detection code is developed on top of MMYOLO v0.60. Please prepare the environment according to the installation documentation.

Note: MMYOLO v0.60 supports up to PyTorch 1.13 and CUDA 11.7. Ensure that your environment meets these requirements for compatibility.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

Training

Multi-gpu training

bash dist_train.sh configs/eattn_net/eattn_net_s_syncbn_fast_8xb16-500e_coco.py 2

Single-gpu Training

python train.py configs/eattn_net/eattn_net_s_syncbn_fast_8xb16-500e_coco.py

Testing

python test.py configs/eattn_net/eattn_net_s_syncbn_fast_8xb16-500e_coco.py work_dirs/eattn_net/eattn_net_s_syncbn_fast_8xb16-500e_coco/best_coco_bbox_mAP_epoch_xxx.pth

Citation

If you find this project useful in your research, please consider cite:

@article{chen2025improving,
  title={Improving generalization capability of object detectors via efficient global attention},
  author={Chen, Jie and Er, Meng Joo},
  journal={Neurocomputing},
  pages={130287},
  year={2025},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
images		images
models		models
.gitignore		.gitignore
README.md		README.md
browse_coco_json.py		browse_coco_json.py
cityscape_dataset_converter.py		cityscape_dataset_converter.py
data		data
dataset_converter.py		dataset_converter.py
dist_test.sh		dist_test.sh
dist_train.sh		dist_train.sh
get_flops.py		get_flops.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving generalization capability of object detectors via efficient global attention

Introduction

Usage

Training

Testing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Improving generalization capability of object detectors via efficient global attention

Introduction

Usage

Training

Testing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages