A hybrid architecture based on structured state space sequence model and convolutional neural network for real-time object detection

Introduction

Real-time performance is essential for practical deployment of object detection on edge devices, where high processing speed and low latency are paramount. This paper introduces a novel approach aimed at boosting real-time object detection while strictly adhering to computational constraints. A structured state space sequence model, Mamba, is strategically embedded in the early stages of the backbone network to capture long-range dependencies, thereby enhancing the model's representation capability. Given the limitations of Mamba in directional perception, a lightweight spatial attention mechanism is introduced to integrate global context into each spatial location. Additionally, a computationally efficient module inspired by the Ghost module is developed to reduce resource demands. This dual-strategy approach optimizes both performance and efficiency in real-time object detection. Extensive experiments demonstrate the superiority of this proposed approach; on the Microsoft Common Objects in Context (COCO) dataset, it achieves a +1.6 AP (Average Precision) improvement over state-of-the-art methods, reaching 41.1 AP with minimal added model complexity on the nano scale. The effectiveness and efficiency of each component are further substantiated through ablation studies on the Pascal Visual Object Classes (Pascal VOC dataset). To verify the universality of the proposed method, this study selects underwater detection, characterized by an extremely complex background environment, as the other validation scenario. Through the application of this proposed approach to underwater object detection, a state-of-the-art result of 69.5 AP was obtained on the Detecting Underwater Objects (DUO) dataset, exceeding that of You Only Look Once Detector version 11 (YOLO11) by +0.3 AP.

Usage

Our detection code is developed on top of MMYOLO v0.60. Please prepare the environment according to the installation documentation.

Note: MMYOLO v0.60 supports up to PyTorch 1.13 and CUDA 11.7. Ensure that your environment meets these requirements for compatibility.

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

Training

Multi-gpu training

bash dist_train.sh configs/hybrid/hybrid_s_syncbn_fast_8xb16-500e_coco.py 2

Single-gpu Training

python train.py configs/hybrid/hybrid_s_syncbn_fast_8xb16-500e_coco.py

Testing

python test.py configs/hybrid/hybrid_s_syncbn_fast_8xb16-500e_coco.py work_dirs/hybrid_s_syncbn_fast_8xb16-500e_coco/best_coco_bbox_mAP_epoch_xxx.pth

Citation

If you find this project useful in your research, please consider cite:

@article{CHEN2025111067,
title = {A hybrid architecture based on structured state space sequence model and convolutional neural network for real-time object detection},
journal = {Engineering Applications of Artificial Intelligence},
volume = {156},
pages = {111067},
year = {2025},
issn = {0952-1976},
doi = {https://doi.org/10.1016/j.engappai.2025.111067},
url = {https://www.sciencedirect.com/science/article/pii/S0952197625010681},
author = {Jie Chen and Meng Joo Er},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
models		models
sources		sources
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
dist_train.sh		dist_train.sh
get_flops.py		get_flops.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A hybrid architecture based on structured state space sequence model and convolutional neural network for real-time object detection

Introduction

Usage

Training

Testing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A hybrid architecture based on structured state space sequence model and convolutional neural network for real-time object detection

Introduction

Usage

Training

Testing

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages