GitHub - SebastianJanampa/DETRPose: DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

📄 This is the official implementation of the paper:
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

Sebastian Janampa and Marios Pattichis

The University of New Mexico
Department of Electrical and Computer Engineering

DETRPose is the first real-time end-to-end transformer model for multi-person pose estimation, achieving outstanding results on the COCO and CrowdPose datasets. In this work, we propose a new denoising technique suitable for pose estimation that uses the Object Keypoint Similarity (OKS) metric to generate positive and negative queries. Additionally, we develop a new classification head and a new classification loss that are variations of the LQE head and the varifocal loss used in D-FINE.

Video

We conduct object detection using DETRPose to show its efficiency and low latency.

output.mp4

🚀 Updates

[2025.06.02] Release DETRPose code and weights.
[2025.06.04] Release Google Colab Notebook.
[2025.06.04] Release HuggingFace 🤗 Space.
[2025.06.17] Release paper on arxiv.

Model Zoo

COCO val2017

Model	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	#Params	Latency	GFLOPs	config	checkpoint
DETRPose-N	57.2	81.7	61.4	64.4	87.9	4.1 M	2.80 ms	9.3	py	57.2
DETRPose-S	67.0	87.6	72.8	73.5	92.4	11.5 M	4.99 ms	33.1	py	67.0
DETRPose-M	69.4	89.2	75.4	75.5	93.7	20.8 M	7.01 ms	67.3	py	69.4
DETRPose-L	72.5	90.6	79.0	78.7	95.0	32.8 M	9.50 ms	107.1	py	72.5
DETRPose-X	73.3	90.5	79.4	79.4	94.9	73.3 M	13.31 ms	239.5	py	73.3

COCO test-dev2017

Model	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	#Params	Latency	GFLOPs	config	checkpoint
DETRPose-N	56.7	83.1	61.1	64.4	89.3	4.1 M	2.80 ms	9.3	py	56.7
DETRPose-S	66.0	88.3	72.0	73.2	93.3	11.5 M	4.99 ms	33.1	py	66.0
DETRPose-M	68.4	90.1	74.8	75.1	94.4	20.8 M	7.01 ms	67.3	py	88.3
DETRPose-L	71.2	91.2	78.1	78.1	95.7	32.8 M	9.50 ms	107.1	py	71.2
DETRPose-X	72.2	91.4	79.3	78.8	95.7	73.3 M	13.31 ms	239.5	py	72.2

CrowdPose test

Model	AP	AP⁵⁰	AP⁷⁵	AP^E	AP^M	AP^H	#Params	Latency	GFLOPs	config	checkpoint
DETRPose-N	56.0	80.7	59.6	65.0	56.6	46.6	4.1 M	2.72 ms	8.8	py	57.2
DETRPose-S	67.4	88.6	72.9	74.7	68.1	59.3	11.5 M	4.80 ms	31.3	py	67.0
DETRPose-M	72.0	91.0	77.8	78.6	72.6	64.5	20.7 M	6.86 ms	64.9	py	69.4
DETRPose-L	73.3	91.6	79.4	79.5	74.0	66.1	32.7 M	9.03 ms	103.5	py	72.5
DETRPose-X	75.1	92.1	81.3	81.3	75.7	68.1	73.3 M	13.01 ms	232.3	py	73.3

Notes:

Latency is evaluated on a single Tesla V100 GPU with $batch\_size = 1$, $fp16$, and $TensorRT==8.6.3$.

Quick start

Setup

conda create -n detrpose python=3.11.9
conda activate detrpose
pip install -r requirements.txt

Data Preparation

Create a folder named data to store the datasets

configs
src
tools
data
  ├── COCO2017
    ├── train2017
    ├── val2017
    ├── test2017
    └── annotations
  └── crowdpose
    ├── images
    └── annotations

COCO2017 dataset

Download COCO2017 from their [website](https://cocodataset.org/#download)

CrowdPose dataset

Download Crowdpose from their [github](https://github.com/jeffffffli/CrowdPose), or use the following command

pip install gdown # to download files from google drive
mkdir crowdpose
cd crowdpose
gdown 1VprytECcLtU4tKP32SYi_7oDRbw7yUTL # images
gdown 1b3APtKpc43dx_5FxizbS-EWGvd-zl7Lb # crowdpose_train.json
gdown 18-IwNa6TOGQPE0RqGNjNY1cJOfNC7MXj # crowdpose_val.json
gdown 13xScmTWqO6Y6m_CjiQ-23ptgX9sC-J9I # crowdpose_trainval.json
gdown 1FUzRj-dPbL1OyBwcIX2BgFPEaY5Yrz7S # crowdpose_test.json
unzip images.zip

Usage

COCO2017 dataset

Set Model

export model=l # n s m l x

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --pretrain dfine_${model}_obj365

if you choose model=n, do

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_n.py --device cuda --amp --pretrain dfine_n_obj365

Testing (COCO2017 val)

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume <PTH_FILE_PATH> --eval

Testing (COCO2017 test-dev)

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume <PTH_FILE_PATH> --test

After running the command. You'll get a file named results.json. Compress it and submit it to the COCO competition website

Replicate results (optional)

# First, download the official weights
wget https://github.com/SebastianJanampa/DETRPose/releases/download/model_weights/detrpose_hgnetv2_${model}.pth

# Second, run evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}.py --device cuda --amp --resume detrpose_hgnetv2_${model}.pth --eval

CrowdPose dataset

Set Model

export model=l # n s m l x

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --pretrain dfine_${model}_obj365

if you choose model=n, do

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_n_crowdpose.py --device cuda --amp --pretrain dfine_n_obj365

Testing

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --resume <PTH_FILE_PATH> --eval

Replicate results (optional)

# First, download the official weights
wget https://github.com/SebastianJanampa/DETRPose/releases/download/model_weights/detrpose_hgnetv2_${model}_crowdpose.pth

# Second, run evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4  train.py --config_file configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --device cuda --amp --resume detrpose_hgnetv2_${model}_crowdpose.pth --eval

Lambda instances

All latency experiments using Lambda.ai instances. We have provided two README files

to run a TensorRT container in a Lambda.ai instance
to install a TensorRT .deb in a Lambda.ai instance

Tools

Deployment

Setup

pip install -r tools/inference/requirements.txt
export model=l  # n s m l x

Export onnx For COCO model

python tools/deployment/export_onnx.py --check -c configs/detrpose/detrpose_hgnetv2_${model}.py -r detrpose_hgnetv2_${model}.pth

For CrowdPose model

python tools/deployment/export_onnx.py --check -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r detrpose_hgnetv2_${model}_crowdpose.pth

Export tensorrt For a specific file

trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16

or, for all files inside a folder

python tools/deployment/export_tensorrt.py

Inference (Visualization)

Setup

export model=l  # n s m l x

Inference (onnxruntime / tensorrt / torch)

Inference on images and videos is supported.

For a single file

# For COCO model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}.onnx --input examples/example1.jpg --annotator COCO
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}.engine --input examples/example1.jpg --annotator COCO
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}.py -r <checkpoint.pth> --input examples/example1.jpg --device cuda:0 

# For CrowdPose model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}_crowdpose.onnx --input examples/example1.jpg --annotator CrowdPose
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}_crowdpose.engine --input examples/example1.jpg --annotator CrowdPose
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r <checkpoint.pth> --input examples/example1.jpg --device cuda:0

For a folder

# For COCO model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}.onnx --input examples --annotator COCO
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}.engine --input examples --annotator COCO
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}.py -r <checkpoint.pth> --input examples --device cuda:0 

# For CrowdPose model
python tools/inference/onnx_inf.py --onnx detrpose_hgnetv2_${model}_crowdpose.onnx --input examples --annotator CrowdPose
python tools/inference/trt_inf.py --trt detrpose_hgnetv2_${model}_crowdpose.engine --input examples --annotator CrowdPose
python tools/inference/torch_inf.py -c configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py -r <checkpoint.pth> --input examples --device cuda:0

Benchmark

Setup

pip install -r tools/benchmark/requirements.txt
export model=l  # n s m l

Model FLOPs, MACs, and Params

# For COCO model
python tools/benchmark/get_info.py --config configs/detrpose/detrpose_hgnetv2_${model}.py

# For COCO model
python tools/benchmark/get_info.py --config configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py

TensorRT Latency

python tools/benchmark/trt_benchmark.py --infer_dir ./data/COCO2017/val2017 --engine_dir trt_engines

Pytorch Latency

# For COCO model
python tools/benchmark/torch_benchmark.py -c ./configs/detrpose/detrpose_hgnetv2_${model}.py --resume detrpose_hgnetv2_${model}.pth --infer_dir ./data/COCO/val2017

# For CrowdPose model
python tools/benchmark/torch_benchmark.py -c ./configs/detrpose/detrpose_hgnetv2_${model}_crowdpose.py --resume detrpose_hgnetv2_${model}_crowdpose.pth --infer_dir ./data/COCO/val2017

Citation

If you use DETRPose or its methods in your work, please cite the following BibTeX entries:

bibtex

@misc{janampa2025detrpose,
      title={DETRPose: Real-time end-to-end transformer model for multi-person pose estimation}, 
      author={Sebastian Janampa and Marios Pattichis},
      year={2025},
      eprint={2506.13027},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.13027}, 
}

Acknowledgement

This work was supported in part by Lambda.ai.

Our work is built upon DEIM, D-FINE, Detectron2, and GroupPose.

✨ Feel free to contribute and reach out if you have any questions! ✨

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
configs/detrpose		configs/detrpose
examples		examples
src		src
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
DETRPose_tutorial.ipynb		DETRPose_tutorial.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

🚀 Updates

Model Zoo

COCO val2017

COCO test-dev2017

CrowdPose test

Quick start

Setup

Data Preparation

Usage

Lambda instances

Tools

Citation

Acknowledgement

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

🚀 Updates

Model Zoo

COCO val2017

COCO test-dev2017

CrowdPose test

Quick start

Setup

Data Preparation

Usage

Lambda instances

Tools

Citation

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages