Skip to content

SebastianJanampa/DETRPose

Repository files navigation

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

colab arxiv colab

📄 This is the official implementation of the paper:
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

Sebastian Janampa and Marios Pattichis

The University of New Mexico
Department of Electrical and Computer Engineering


DETRPose is the first real-time end-to-end transformer model for multi-person pose estimation, achieving outstanding results on the COCO and CrowdPose datasets. In this work, we propose a new denoising technique suitable for pose estimation that uses the Object Keypoint Similarity (OKS) metric to generate positive and negative queries. Additionally, we develop a new classification head and a new classification loss that are variations of the LQE head and the varifocal loss used in D-FINE.

Video We conduct object detection using DETRPose to show its efficiency and low latency.
output.mp4

🚀 Updates

Model Zoo

COCO val2017

Model AP AP50 AP75 AR AR50 #Params Latency GFLOPs config checkpoint
DETRPose-N 57.2 81.7 61.4 64.4 87.9 4.1 M 2.80 ms 9.3 py 57.2
DETRPose-S 67.0 87.6 72.8 73.5 92.4 11.5 M 4.99 ms 33.1 py 67.0
DETRPose-M 69.4 89.2 75.4 75.5 93.7 20.8 M 7.01 ms 67.3 py 69.4
DETRPose-L 72.5 90.6 79.0 78.7 95.0 32.8 M 9.50 ms 107.1 py 72.5
DETRPose-X 73.3 90.5 79.4 79.4 94.9 73.3 M 13.31 ms 239.5 py 73.3

COCO test-dev2017

Model AP AP50 AP75 AR AR50 #Params Latency GFLOPs config checkpoint
DETRPose-N 56.7 83.1 61.1 64.4 89.3 4.1 M 2.80 ms 9.3 py 56.7
DETRPose-S 66.0 88.3 72.0 73.2 93.3 11.5 M 4.99 ms 33.1 py 66.0
DETRPose-M 68.4 90.1 74.8 75.1 94.4 20.8 M 7.01 ms 67.3 py 88.3
DETRPose-L 71.2 91.2 78.1 78.1 95.7 32.8 M 9.50 ms 107.1 py 71.2
DETRPose-X 72.2 91.4 79.3 78.8 95.7 73.3 M 13.31 ms 239.5 py 72.2

CrowdPose test

Model AP AP50 AP75 APE APM APH #Params Latency GFLOPs config checkpoint
DETRPose-N 56.0 80.7 59.6 65.0 56.6 46.6 4.1 M 2.72 ms 8.8 py 57.2
DETRPose-S 67.4 88.6 72.9 74.7 68.1 59.3 11.5 M 4.80 ms 31.3 py 67.0
DETRPose-M 72.0 91.0 77.8 78.6 72.6 64.5 20.7 M 6.86 ms 64.9 py 69.4
DETRPose-L 73.3 91.6 79.4 79.5 74.0 66.1 32.7 M 9.03 ms 103.5 py 72.5
DETRPose-X 75.1 92.1 81.3 81.3 75.7 68.1 73.3 M 13.01 ms 232.3 py 73.3

Notes:

  • Latency is evaluated on a single Tesla V100 GPU with $batch\_size = 1$, $fp16$, and $TensorRT==8.6.3$.

Quick start

Open In Colab Open in Spaces

Setup

conda create -n detrpose python=3.11.9
conda activate detrpose
pip install -r requirements.txt

Data Preparation

Create a folder named data to store the datasets

configs
src
tools
data
  ├── COCO2017
    ├── train2017
    ├── val2017
    ├── test2017
    └── annotations
  └── crowdpose
    ├── images
    └── annotations

COCO2017 dataset Download COCO2017 from their [website](https://cocodataset.org/#download)
CrowdPose dataset Download Crowdpose from their [github](https://github.com/jeffffffli/CrowdPose), or use the following command
pip install gdown # to download files from google drive
mkdir crowdpose
cd crowdpose
gdown 1VprytECcLtU4tKP32SYi_7oDRbw7yUTL # images
gdown 1b3APtKpc43dx_5FxizbS-EWGvd-zl7Lb # crowdpose_train.json
gdown 18-IwNa6TOGQPE0RqGNjNY1cJOfNC7MXj # crowdpose_val.json
gdown 13xScmTWqO6Y6m_CjiQ-23ptgX9sC-J9I # crowdpose_trainval.json
gdown 1FUzRj-dPbL1OyBwcIX2BgFPEaY5Yrz7S # crowdpose_test.json
unzip images.zip

Usage

COCO2017 dataset
  1. Set Model
export model=l # n s m l x
  1. Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --pretrain dfine_${model}_obj365 

if you choose model=n, do

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_n.py --device cuda --amp --pretrain dfine_n_obj365 
  1. Testing (COCO2017 val)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume <PTH_FILE_PATH> --eval
  1. Testing (COCO2017 test-dev)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume <PTH_FILE_PATH> --test

After running the command. You'll get a file named results.json. Compress it and submit it to the COCO competition website

  1. Replicate results (optional)
# First, download the official weights
wget https://github.com/SebastianJanampa/DETRPose/releases/download/model_weights/detrpose_hgnetv2_${model}.pth

# Second, run evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume detrpose_hgnetv2_${model}.pth --eval
CrowdPose dataset
  1. Set Model
export model=l # n s m l x
  1. Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --pretrain dfine_${model}_obj365 

if you choose model=n, do

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_n_crowdpose.py --device cuda --amp --pretrain dfine_n_obj365 
  1. Testing
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --resume <PTH_FILE_PATH> --eval
  1. Replicate results (optional)
# First, download the official weights
wget https://github.com/SebastianJanampa/DETRPose/releases/download/model_weights/detrpose_hgnetv2_${model}_crowdpose.pth

# Second, run evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --resume detrpose_hgnetv2_${model}_crowdpose.pth --eval

Lambda instances

All latency experiments using Lambda.ai instances. We have provided two README files

  1. to run a TensorRT container in a Lambda.ai instance
  2. to install a TensorRT .deb in a Lambda.ai instance

Tools

Deployment
  1. Setup
pip install -r tools/inference/requirements.txt
export model=l  # n s m l x
  1. Export onnx For COCO model
python tools/deployment/export_onnx.py --check -c configs/detrpose/detrpose_hgnetv2_${model}.py -r detrpose_hgnetv2_${model}.pth

For CrowdPose model

python tools/deployment/export_onnx.py --check -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r detrpose_hgnetv2_${model}_crowdpose.pth
  1. Export tensorrt For a specific file
trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16

or, for all files inside a folder

python tools/deployment/export_tensorrt.py
Inference (Visualization)
  1. Setup
export model=l  # n s m l x
  1. Inference (onnxruntime / tensorrt / torch)

Inference on images and videos is supported.

For a single file

# For COCO model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}.onnx --input examples/example1.jpg --annotator COCO
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}.engine --input examples/example1.jpg --annotator COCO
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}.py -r <checkpoint.pth> --input examples/example1.jpg --device cuda:0 

# For CrowdPose model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}_crowdpose.onnx --input examples/example1.jpg --annotator CrowdPose
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}_crowdpose.engine --input examples/example1.jpg --annotator CrowdPose
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r <checkpoint.pth> --input examples/example1.jpg --device cuda:0 

For a folder

# For COCO model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}.onnx --input examples --annotator COCO
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}.engine --input examples --annotator COCO
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}.py -r <checkpoint.pth> --input examples --device cuda:0 

# For CrowdPose model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}_crowdpose.onnx --input examples --annotator CrowdPose
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}_crowdpose.engine --input examples --annotator CrowdPose
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r <checkpoint.pth> --input examples --device cuda:0
Benchmark
  1. Setup
pip install -r tools/benchmark/requirements.txt
export model=l  # n s m l
  1. Model FLOPs, MACs, and Params
# For COCO model
python tools/benchmark/get_info.py --config configs/detrpose/detrpose_hgnetv2_${model}.py

# For COCO model
python tools/benchmark/get_info.py --config configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py
  1. TensorRT Latency
python tools/benchmark/trt_benchmark.py --infer_dir ./data/COCO2017/val2017 --engine_dir trt_engines
  1. Pytorch Latency
# For COCO model
python tools/benchmark/torch_benchmark.py -c ./configs/detrpose/detrpose_hgnetv2_${model}.py --resume detrpose_hgnetv2_${model}.pth --infer_dir ./data/COCO/val2017

# For CrowdPose model
python tools/benchmark/torch_benchmark.py -c ./configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --resume detrpose_hgnetv2_${model}_crowdpose.pth --infer_dir ./data/COCO/val2017

Citation

If you use DETRPose or its methods in your work, please cite the following BibTeX entries:

bibtex
@misc{janampa2025detrpose,
      title={DETRPose: Real-time end-to-end transformer model for multi-person pose estimation}, 
      author={Sebastian Janampa and Marios Pattichis},
      year={2025},
      eprint={2506.13027},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.13027}, 
}

Acknowledgement

This work was supported in part by Lambda.ai.

Our work is built upon DEIM, D-FINE, Detectron2, and GroupPose.

✨ Feel free to contribute and reach out if you have any questions! ✨

About

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages