(SIGGRAPH ASIA 2025)
Sylvia Yuan, Ruoxi Shi, Xinyue Wei, Xiaoshuai Zhang, Hao Su, Minghua Liu
[Project Website | arXiv Paper]
- Environment Setup — Conda-based setup instructions
- Pretrained Model — Download and use pretrained checkpoints
- Inference Usage — Run inference
- Post-Inference Tasks — Mesh reconstruction, joint estimation, etc.
- Evaluation Metrics — Chamfer, F-score, LPIPS, PSNR, CLIP
- Multi-link Inference — Combining masks for multi-joint articulation
- Data Rendering and Processing (Blender) — Dataset rendering with Blender
- Custom Data Inference — Run LARM on your own RGB + Camera Information inputs
- Training Instructions — Distributed training setup and launch
- Citation — How to cite LARM
To set up the development environment for LARM using Conda, follow these steps:
cd LARM
conda env create -f environment.yaml
conda activate larmThis environment includes all necessary dependencies for training, inference, and visualization.
A pretrained LARM model checkpoint is available for download (Google Drive).
Please download the checkpoint to weight/larm/.
- Pretrained on objaverse dataset.
- Trained on: partnet-mobility dataset ("StorageFurniture", "Microwave", "Refrigerator", "Safe", "TrashCan", "Table").
- Trained on 512x512 rendered images, with 6 input images (3 state=0 images, 3 state=1 images).
Download the pretrained model as instructed in the previous section.
Download and unzip the sample data to data_sample. There are three data samples.
Download sample data here: Sample Data
View sample output here: Sample Output
LARM inference runner with three mutually exclusive modes:
- random : Sample random target extrinsics and render for a provided qpos list (required for mesh reconstruction).
- video : Render a ring of target views and stitch them into an MP4
- view : Target views — render one image per provided target pose across all qpos
When using different model versions, note that when using the rotation augmented model, apply the flag --rot-aug, and use the corresponding rotation augmented sample data. When using the original model, simply run without the flag and run the data_sample folder.
# Random mode
python inference/inference.py --random \
--model_ckpt weight/larm/model_198000.pth \
--datalist_path data_sample/random_metadata/data.txt \
--save_dir output_random \
--resolution 512 --batch_size 4 --num_input_views 6 \
--num_target_views 32 \
--qpos_list "0.00,0.25,0.50,0.75,1.00" \
--qpos_in_a 0.00 --qpos_in_b 1.00
# Video mode
python inference/inference.py --video \
--model_ckpt weight/larm/model_198000.pth \
--datalist_path data_sample/random_metadata/data.txt \
--save_dir output_video \
--resolution 512 --batch_size 4 --num_input_views 6 \
--num_target_views 128 --fps 25
# View mode
python inference/inference.py --view \
--model_ckpt weight/larm/model_198000.pth \
--datalist_path data_sample/view_metadata/data.txt \
--save_dir output_view \
--resolution 512 --batch_size 4 --num_input_views 6 \
--qpos_in_a 0.00 --qpos_in_b 1.00Each input sample must include:
- Camera intrinsics and pose information (from
.jsonfiles) for input images. - Three input images of the object at state 0.
- Three input images of the object at state 1.
- A datalist text file listing all target object directories to be evaluated.
- If more than 3 views per state are provided, we use K-Means clustering to cluster all camera poses into 3 clusters and select the cluster centers. This ensures diverse viewpoints that are more likely to capture different regions of the object.
- If exactly 3 views per state are given, all three are used directly without K-Means.
For each object and target configuration, the following files will be saved:
output_dir/
└── eval_{obj_id}_joint_{joint_idx}/
├── transforms.json
└── images/
├── {idx}_{qpos}.png
├── {idx}_{qpos}_depth.npy
├── {idx}_{qpos}_mask.png
└── {idx}_{qpos}_partmask.png
After running inference/inference.py, LARM supports several downstream applications. Each task is located in a separate folder with its own README for detailed instructions.
Given articulation types (prismatic or revolute), estimate joint parameters, axis orientations. Located in the axis_est directory.
Detailed usage and configuration are described in axis_est/README.md.
Located in the mesh directory.
- Reconstruct 3D meshes from model outputs.
- Combine segmented parts into full articulated objects.
- Export or visualize reconstructed meshes.
Please refer to the mesh/README.md file for step-by-step guidance.
Located in the metrics directory.
- Evaluate the model using the following metrics:
- Chamfer Distance
- F-Score
- LPIPS
- NVS/Mesh Rendering PSNR
- CLIP Score
# image metric
python metrics/eval.py --image --psnr/clip/lpips \
--datalist_path /path/to/data.txt \
--category_json /path/to/train_test_split.json \
--render_dir /path/to/render_dir \
--load_dir /path/to/output_dir \
--categories ALL
# mesh metric, to get ground truth mesh for metrics computation, see data_proc/get_gt_mesh.py
python mesh/urdf_to_mesh.py \
--mode sap/tsdf \
--load_dir /path/to/output_dir \
--output_dir /path/to/mesh_dir \
--qpos_list 0.00,0.25,0.50,0.75,1.00
python metrics/eval.py --mesh --cd/fscore \
--datalist_path /path/to/data.txt \
--category_json /path/to/train_test_split.json \
--load_dir /path/to/output_dir \
--pred_mesh_root_main /path/to/mesh_dir \
--gt_mesh_root /path/to/gtmesh_dir \
--joint_info_json /path/to/joint_info.json \
--categories ALL
# joint metric
python metrics/eval.py --joint \
--datalist_path /path/to/data.txt \
--category_json /path/to/train_test_split.json \
--load_dir /path/to/output_dir \
--joint_info_json /path/to/joint_info.json \
--categories ALLTo run batch evaluation, follow scripts/batch_eval.sh.
LARM supports multilink inference by applying single joint inference process on each joint in an object, then combining the resulting partmask into a joint sum_partmask, and using it in mesh reconstruction as one would any single joint base. Then, combine the seperately reconstructed parts with the base.
Download here: Multi-link Sample Data
python inference/inference.py --multilink \
--model_ckpt weight/larm/model_198000.pth \
--datalist_path data_sample_multilink/random_metadata/data.txt \
--save_dir output_multilink
python mesh/combine_mask.py --load_dir=output_multilink --qpos_list=0.00,0.25,0.50,0.75,1.00Note for custom data, all joints for a multilink object should be in the data.txt file.
Then run joint estimation (refer to README.md file in maxis_est/) and mesh reconstruction as usual, with mesh/tsdf.py or mesh/SAP (refer to README.md file in mesh/), followed by:
python utils/combine_urdf.py --mode sap/tsdf --load_dir output_multilink --datalist data_sample_multilink/random_metadata/data.txtThe output folder is going to be structured as follows
output_multilink/
└── sap_urdf_final/ (only if using sap for reconstruction)
└── eval_{obj_id}_joint_{min_joint_id}/
├── mobility_multilink.urdf
├── eval_{obj_id}_joint_{joint_idx}_part.ply
├── ...
├── eval_{obj_id}_joint_{joint_idx}_part.ply
└── eval_{obj_id}_joint_{joint_idx}_multilink_base.ply
Located in the data_proc directory.
- Perform Blender-based rendering of datasets.
- Preprocess raw data into training-ready formats.
- Control camera angles, materials, and lighting for realistic outputs.
See data_proc/README.md for Blender pipeline setup and asset preparation.
LARM supports inference on custom RGB data for articulated objects. To prepare your dataset for inference, structure it as follows:
Each object or sequence should reside in a single directory containing:
-
RGB images named:
color_{qpos}_in_{idx}.pngqposdenotes the joint position (e.g., 0.00 and 1.00)idxis the frame index- Example:
color_0.00_in_001.png,color_1.00_in_001.png
-
Metadata files named:
meta_0.00.jsonandmeta_1.00.json- One JSON file per
qposvalue - Each file contains camera parameters and transformation data
- One JSON file per
{
"resolution": <int>,
"sample_0": {
"intrinsics": [
[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]
],
"input_frame_idx": {
"radius": <float>,
"transform_matrix": [
[x11, x12, x13, x14],
[x21, x22, x23, x24],
[x31, x32, x33, x34],
[0.0, 0.0, 0.0, 1.0]
],
}
}
}resolution: Image resolution (square assumed)intrinsics: 3x3 camera intrinsics matrixtransform_matrix: 4x4 pose matrix in Blender coordinate conventioninput_frame_idx: Frame-specific pose:- Pose (
transform_matrix)
- Pose (
Sample meta files for inference input are present in the data_sample folder. The following script collects required information from a data render folder if processed with data_proc. Otherwise, construct required input data json before running the inference script.
python inference/make_input_from_render.py --in-dir=/path/to/input_dir --out-json=/path/to/target_json --mode=random/viewOnce your data is formatted correctly, construct a txt file of inference data folder list, pass the path to the inference script.
For extended instructions on rendering from custom 3D asset data, refer to the data_proc folder.
Training the LARM model can be performed in a distributed multi-node, multi-GPU setup using PyTorch's torchrun. The training behavior is controlled via a YAML configuration file.
Follow the instructions in data_proc for blender rendering.
Sample training data can be downloaded and viewed from:
Download here: Sample Training Data (Google Drive)
The main training config file is:
configs/part.yamlTo run training across nodes with multiple GPUs each:
NCCL_DEBUG=INFO torchrun --nproc_per_node=num_gpus --nnodes=num_nodes \
--master-port ${MASTER_PORT} \
--master-addr ${MASTER_ADDR} \
--node-rank ${JOB_COMPLETION_INDEX} \
trainer_mask.py \
--config=configs/part.yaml@article{yuan2025larmlargearticulatedobjectreconstruction,
title={LARM: A Large Articulated-Object Reconstruction Model},
author={Yuan, Sylvia and Shi, Ruoxi and Wei, Xinyue and Zhang, Xiaoshuai and Su, Hao and Liu, Minghua},
journal={arXiv preprint arXiv:2511.11563},
year={2025},
}
