📄 This is the official implementation of the paper:
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation
Sebastian Janampa and Marios Pattichis
The University of New Mexico
Department of Electrical and Computer Engineering
DETRPose is the first real-time end-to-end transformer model for multi-person pose estimation, achieving outstanding results on the COCO and CrowdPose datasets. In this work, we propose a new denoising technique suitable for pose estimation that uses the Object Keypoint Similarity (OKS) metric to generate positive and negative queries. Additionally, we develop a new classification head and a new classification loss that are variations of the LQE head and the varifocal loss used in D-FINE.
Video
We conduct object detection using DETRPose to show its efficiency and low latency.output.mp4
- [2025.06.02] Release DETRPose code and weights.
- [2025.06.04] Release Google Colab Notebook.
- [2025.06.04] Release HuggingFace 🤗 Space.
- [2025.06.17] Release paper on arxiv.
| Model | AP | AP50 | AP75 | AR | AR50 | #Params | Latency | GFLOPs | config | checkpoint |
|---|---|---|---|---|---|---|---|---|---|---|
| DETRPose-N | 57.2 | 81.7 | 61.4 | 64.4 | 87.9 | 4.1 M | 2.80 ms | 9.3 | py | 57.2 |
| DETRPose-S | 67.0 | 87.6 | 72.8 | 73.5 | 92.4 | 11.5 M | 4.99 ms | 33.1 | py | 67.0 |
| DETRPose-M | 69.4 | 89.2 | 75.4 | 75.5 | 93.7 | 20.8 M | 7.01 ms | 67.3 | py | 69.4 |
| DETRPose-L | 72.5 | 90.6 | 79.0 | 78.7 | 95.0 | 32.8 M | 9.50 ms | 107.1 | py | 72.5 |
| DETRPose-X | 73.3 | 90.5 | 79.4 | 79.4 | 94.9 | 73.3 M | 13.31 ms | 239.5 | py | 73.3 |
| Model | AP | AP50 | AP75 | AR | AR50 | #Params | Latency | GFLOPs | config | checkpoint |
|---|---|---|---|---|---|---|---|---|---|---|
| DETRPose-N | 56.7 | 83.1 | 61.1 | 64.4 | 89.3 | 4.1 M | 2.80 ms | 9.3 | py | 56.7 |
| DETRPose-S | 66.0 | 88.3 | 72.0 | 73.2 | 93.3 | 11.5 M | 4.99 ms | 33.1 | py | 66.0 |
| DETRPose-M | 68.4 | 90.1 | 74.8 | 75.1 | 94.4 | 20.8 M | 7.01 ms | 67.3 | py | 88.3 |
| DETRPose-L | 71.2 | 91.2 | 78.1 | 78.1 | 95.7 | 32.8 M | 9.50 ms | 107.1 | py | 71.2 |
| DETRPose-X | 72.2 | 91.4 | 79.3 | 78.8 | 95.7 | 73.3 M | 13.31 ms | 239.5 | py | 72.2 |
| Model | AP | AP50 | AP75 | APE | APM | APH | #Params | Latency | GFLOPs | config | checkpoint |
|---|---|---|---|---|---|---|---|---|---|---|---|
| DETRPose-N | 56.0 | 80.7 | 59.6 | 65.0 | 56.6 | 46.6 | 4.1 M | 2.72 ms | 8.8 | py | 57.2 |
| DETRPose-S | 67.4 | 88.6 | 72.9 | 74.7 | 68.1 | 59.3 | 11.5 M | 4.80 ms | 31.3 | py | 67.0 |
| DETRPose-M | 72.0 | 91.0 | 77.8 | 78.6 | 72.6 | 64.5 | 20.7 M | 6.86 ms | 64.9 | py | 69.4 |
| DETRPose-L | 73.3 | 91.6 | 79.4 | 79.5 | 74.0 | 66.1 | 32.7 M | 9.03 ms | 103.5 | py | 72.5 |
| DETRPose-X | 75.1 | 92.1 | 81.3 | 81.3 | 75.7 | 68.1 | 73.3 M | 13.01 ms | 232.3 | py | 73.3 |
Notes:
-
Latency is evaluated on a single Tesla V100 GPU with
$batch\_size = 1$ ,$fp16$ , and$TensorRT==8.6.3$ .
conda create -n detrpose python=3.11.9
conda activate detrpose
pip install -r requirements.txtCreate a folder named data to store the datasets
configs
src
tools
data
├── COCO2017
├── train2017
├── val2017
├── test2017
└── annotations
└── crowdpose
├── images
└── annotations
COCO2017 dataset
Download COCO2017 from their [website](https://cocodataset.org/#download)CrowdPose dataset
Download Crowdpose from their [github](https://github.com/jeffffffli/CrowdPose), or use the following commandpip install gdown # to download files from google drive
mkdir crowdpose
cd crowdpose
gdown 1VprytECcLtU4tKP32SYi_7oDRbw7yUTL # images
gdown 1b3APtKpc43dx_5FxizbS-EWGvd-zl7Lb # crowdpose_train.json
gdown 18-IwNa6TOGQPE0RqGNjNY1cJOfNC7MXj # crowdpose_val.json
gdown 13xScmTWqO6Y6m_CjiQ-23ptgX9sC-J9I # crowdpose_trainval.json
gdown 1FUzRj-dPbL1OyBwcIX2BgFPEaY5Yrz7S # crowdpose_test.json
unzip images.zipCOCO2017 dataset
- Set Model
export model=l # n s m l x- Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --pretrain dfine_${model}_obj365 if you choose model=n, do
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_n.py --device cuda --amp --pretrain dfine_n_obj365 - Testing (COCO2017 val)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume <PTH_FILE_PATH> --eval- Testing (COCO2017 test-dev)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume <PTH_FILE_PATH> --testAfter running the command. You'll get a file named results.json. Compress it and submit it to the COCO competition website
- Replicate results (optional)
# First, download the official weights
wget https://github.com/SebastianJanampa/DETRPose/releases/download/model_weights/detrpose_hgnetv2_${model}.pth
# Second, run evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume detrpose_hgnetv2_${model}.pth --evalCrowdPose dataset
- Set Model
export model=l # n s m l x- Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --pretrain dfine_${model}_obj365 if you choose model=n, do
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_n_crowdpose.py --device cuda --amp --pretrain dfine_n_obj365 - Testing
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --resume <PTH_FILE_PATH> --eval- Replicate results (optional)
# First, download the official weights
wget https://github.com/SebastianJanampa/DETRPose/releases/download/model_weights/detrpose_hgnetv2_${model}_crowdpose.pth
# Second, run evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --resume detrpose_hgnetv2_${model}_crowdpose.pth --evalAll latency experiments using Lambda.ai instances. We have provided two README files
- to run a TensorRT container in a Lambda.ai instance
- to install a TensorRT
.debin a Lambda.ai instance
Deployment
- Setup
pip install -r tools/inference/requirements.txt
export model=l # n s m l x- Export onnx For COCO model
python tools/deployment/export_onnx.py --check -c configs/detrpose/detrpose_hgnetv2_${model}.py -r detrpose_hgnetv2_${model}.pthFor CrowdPose model
python tools/deployment/export_onnx.py --check -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r detrpose_hgnetv2_${model}_crowdpose.pth- Export tensorrt For a specific file
trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16or, for all files inside a folder
python tools/deployment/export_tensorrt.pyInference (Visualization)
- Setup
export model=l # n s m l x- Inference (onnxruntime / tensorrt / torch)
Inference on images and videos is supported.
For a single file
# For COCO model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}.onnx --input examples/example1.jpg --annotator COCO
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}.engine --input examples/example1.jpg --annotator COCO
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}.py -r <checkpoint.pth> --input examples/example1.jpg --device cuda:0
# For CrowdPose model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}_crowdpose.onnx --input examples/example1.jpg --annotator CrowdPose
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}_crowdpose.engine --input examples/example1.jpg --annotator CrowdPose
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r <checkpoint.pth> --input examples/example1.jpg --device cuda:0 For a folder
# For COCO model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}.onnx --input examples --annotator COCO
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}.engine --input examples --annotator COCO
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}.py -r <checkpoint.pth> --input examples --device cuda:0
# For CrowdPose model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}_crowdpose.onnx --input examples --annotator CrowdPose
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}_crowdpose.engine --input examples --annotator CrowdPose
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r <checkpoint.pth> --input examples --device cuda:0
Benchmark
- Setup
pip install -r tools/benchmark/requirements.txt
export model=l # n s m l- Model FLOPs, MACs, and Params
# For COCO model
python tools/benchmark/get_info.py --config configs/detrpose/detrpose_hgnetv2_${model}.py
# For COCO model
python tools/benchmark/get_info.py --config configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py- TensorRT Latency
python tools/benchmark/trt_benchmark.py --infer_dir ./data/COCO2017/val2017 --engine_dir trt_engines- Pytorch Latency
# For COCO model
python tools/benchmark/torch_benchmark.py -c ./configs/detrpose/detrpose_hgnetv2_${model}.py --resume detrpose_hgnetv2_${model}.pth --infer_dir ./data/COCO/val2017
# For CrowdPose model
python tools/benchmark/torch_benchmark.py -c ./configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --resume detrpose_hgnetv2_${model}_crowdpose.pth --infer_dir ./data/COCO/val2017If you use DETRPose or its methods in your work, please cite the following BibTeX entries:
bibtex
@misc{janampa2025detrpose,
title={DETRPose: Real-time end-to-end transformer model for multi-person pose estimation},
author={Sebastian Janampa and Marios Pattichis},
year={2025},
eprint={2506.13027},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.13027},
}This work was supported in part by Lambda.ai.
Our work is built upon DEIM, D-FINE, Detectron2, and GroupPose.
✨ Feel free to contribute and reach out if you have any questions! ✨
