[NeurIPS 2025] DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
🚀 This repository is the official implementation of DIPO, which is a framework that generate articulated objects conditioned on Dual-State Image Pairs (resting and articulated states)
[NeruIPS 2025] DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu, Xinjie Wang, Liu Liu, Chunle Guo*, Jiaxiong Qiu, Chongyi Li, Lichao Huang, Zhizhong Su, Ming-Ming Cheng
( * indicates corresponding author)
[Arxiv Paper] [中文版] [Website Page] [PM-X (dataset)] [Gradio Demo]
We recommend to use miniconda to manage the environment. The environment was tested on Ubuntu 20.04.4 LTS.
# Create a conda environment
conda create -n dipo python=3.10
conda activate dipo
# Install Pytorch
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
# Install other packages
pip install -r requirement.txt
# Install Pytorch3D (for evaluation)
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
input your key of GPT-4o in scripts/graph_pred/api.py.
client = AzureOpenAI(
azure_endpoint="your_endpoint",
api_key="your_key",
api_version="your_version",
)
Our PM-X dataset is constructed by an agent system, named LEGO-Art. It builds complex articulated objects with primitives provieded by Partnet-Mobility dataset. You can download the novel dataset at link.

You can download the origin data and our proprocessed data from here, for training and evaluation.
You can download DIPO checkpoint file for inference and CAGE pre-trained weights for training from here.
<project directory>
├── ckpts
│ ├── cage_cfg.ckpt
│ ├── dipo.ckpt
Download 3D assets for mesh retrieval from here, which also the original data of a subset of PartNet-Mobility Dataset.
We provide a quick demo to run the inference on a dual-state image pair.
python demo_img.py \
--configs/config.yaml \
--ckpt_path ckpts/dipo.ckpt \
--img_path_1 path/of/the/resting/state/image \
--img_path_2 path/of/the/articulated/state/image
If you successfully run the script, the output will be saved at ./results. By default, there will be three objects generated out by initializing with different noises.
For other configuration, please see the arguments in the script.
If you're interested in evaluating our model on the test set (see the data split in data/data_split.json for PartNet-Mobility, and in data/data_acd.json for ACD dataset), you can run the test script as below.
# Evaluate on the test set (given GT graph, no object category label)
python test.py \
--config configs/config.yaml \
--ckpt ckpts/dipo.ckpt \
--label_free \
--which_data pm
The evaluation is only supported on a single GPU, which was tested on a NVIDIA 4090 (24GB).
We train our model on top of a CAGE model pretrained under our setting. This checkpoint can be downloaded here, which is put under pretrained folder by default.
<project directory>
├── pretrained
│ ├── cage_cfg.ckpt
Run the following command to train our model from scratch. The original model is trained on 4 NVIDIA A100s.
python train.py \
--config configs/config.yaml \
--pretrained_cage ckpts/cage_cfg.ckpt
# Step-1: Roll description & Build grid-level data
python scripts/layout_generator/api.py --save_path path/to/gpt/data --obj_num 3
# Step-2: Build data with coordinates
python scripts/layout_generator/layout_generator_in_grid.py --save_path path/to/gpt/data
# Step-3 Retrival
python scripts/mesh_retrieval/retrieval.py --src_dir path/to/gpt/data --gt_data_root path/to/assets/for/retrieval
# Step-4 Render data with Blender
python scripts/render/render_dir.py --src_dir path/to/gpt/data
# Step-5 Filter data with VLMs
python scripts/layout_generator/api_filter.py --save_path path/to/gpt/data
@inproceedings{wu2025dipo,
title={DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data},
author={Wu, Ruqi and Wang, Xinjie and Liu, Liu and Guo, Chunle and Qiu, Jiaxiong and Li, Chongyi and Huang, Lichao and Su, Zhizhong and Cheng, Ming-Ming},
booktitle={Advances in Neural Information Processing Systems 39 (NeurIPS 2025)},
year={2025}
}

