This is the PyTorch implementation of paper SinRef-6D published in IEEE TRO by J. Liu, W. Sun, K. Zeng, J. Zheng, H. Yang, H. Rahmani, A. Mian, and L. Wang. SinRef-6D is a single reference view-based CAD model-free novel object 6D pose estimation method, which is simple yet effective and has strong scalability for practical applications.
Given a single RGB-D reference view of an unseen object in a default robot manipulation viewpoint, we aim to predict its 6-DoF absolute pose from any query view.
SinRef-6D deployment in real-world robotic manipulation scenarios. Notably, the reference view is not carefully selected. We select a default robot manipulation viewpoint (free of occlusion and with minimal self-occlusion) using an Intel RealSense L515 RGB-D camera as the reference view.
To the best of our knowledge, we are the first to present a method for novel object 6D absolute pose estimation using only a single reference view in real-world robotic manipulation scenarios. This approach simultaneously eliminates the need for object CAD models, dense reference views, and model retraining, offering enhanced efficiency and scalability while demonstrating strong generalization to potential real-world robotic applications.
More robotic demos can be seen at our Project Page.
This repository contains:
- Training code for the pose estimation model
- BOP evaluation scripts
- Custom-object inference scripts
- CUDA/C++ extensions used by the model
SinRef-6D
├── Pose_Estimation_Model/
│ ├── config/
│ ├── model/
│ ├── provider/
│ ├── utils/
│ ├── train.py
│ ├── test_bop.py
│ └── run_inference_custom.py
├── Data/
├── kernels/
├── dwconv/
└── environment.yaml
Main folders:
Pose_Estimation_Model/: core model, datasets, training, evaluation, and inferenceData/: expected dataset layout and example inputskernels/,dwconv/: low-level CUDA extensions used by the VMamba and point processing code
The model pipeline is:
- Crop an observed object instance from RGB-D input.
- Convert depth to an observed point cloud.
- Load rendered templates for the target object.
- Extract RGB-aligned features with VMamba and point features with PointMamba.
- Match observed points to template points.
- Recover the final pose with correspondence-based rigid alignment.
The main model entry is Pose_Estimation_Model/model/pose_estimation_model.py.
The recommended environment is defined in environment.yaml.
This setup is intended for:
- CUDA 11.8
- Python 3.10
- PyTorch 2.0.0
conda env create -f environment.yaml
conda activate sinref6dAfter activating the environment, build the local extensions:
export CUDA_HOME=/usr/local/cuda-11.8
cd Pose_Estimation_Model/model/pointnet2
python setup.py install
cd ../../../Optional extensions:
kernels/selective_scan/is bundled in the repo and provides low-level kernels used by the VMamba stackdwconv/is also bundled and can be installed separately if you use that branch of the code
If knn_cuda is unavailable on your machine, the code now falls back to a pure PyTorch KNN implementation. It is slower, but useful for first-time setup and debugging.
If causal-conv1d needs to be built locally, you can also install it from the bundled source tree:
cd Pose_Estimation_Model/model/causal-conv1d
python setup.py install
cd ../../../Download Pretrained Weights:
You can download the pretrained model weights from Google Drive.
The download includes:
- SinRef-6D trained weights: Our trained pose estimation model weights
- VMamba backbone weights: Pretrained VMamba backbone weights used in our pipeline
After downloading, place the weights in the appropriate directories as specified in the configuration files.
Download Datasets:
-
Training Datasets (MegaPose-GSO & MegaPose-ShapeNetCore): Available at BOP Challenge 2023 Training Datasets
- MegaPose-GSO: Objects from Google Scanned Objects
- MegaPose-ShapeNetCore: Objects from ShapeNetCore
-
BOP Test Datasets: Available at BOP Benchmark Datasets
- Includes: YCB-V, LM-O, T-LESS, ITODD, HB, IC-BIN, TUD-L, etc.
The expected directory layout is:
Data
├── MegaPose-Training-Data
│ ├── MegaPose-GSO
│ └── MegaPose-ShapeNetCore
├── BOP
│ ├── ycbv
│ ├── lmo
│ ├── icbin
│ ├── itodd
│ ├── hb
│ ├── tudl
│ └── tless
└── BOP-Templates
├── ycbv
├── lmo
├── icbin
├── itodd
├── hb
├── tudl
└── tless
By default, the config uses relative paths:
Data/MegaPose-Training-DataData/BOPData/BOP-Templates
If your datasets are stored outside the repo, the code will also try to resolve the same Data/... structure from a shared parent directory.
Download BOP-Templates Dataset:
You can download the pre-rendered BOP-Templates dataset from Google Drive.
Expected template roots:
-
training templates:
Data/MegaPose-Training-Data/MegaPose-GSO/templatesData/MegaPose-Training-Data/MegaPose-ShapeNetCore/templates
-
BOP test templates:
Data/BOP-Templates/<dataset>
The training and BOP loaders expect pre-rendered RGB, mask, depth or XYZ files together with pose metadata in the layout already used by this repository.
Use the base config:
python Pose_Estimation_Model/train.py \
--config Pose_Estimation_Model/config/base.yaml \
--model pose_estimation_model \
--gpus 0Common arguments:
--gpus: GPU ids, for example0or0,1--exp_id: experiment id used in the log directory name--checkpoint_iter: resume from a saved iteration
Training outputs are written under:
log/<model>_<config>_id<exp_id>/
python Pose_Estimation_Model/test_bop.py \
--config Pose_Estimation_Model/config/base.yaml \
--dataset ycbv \
--gpus 0The script expects detection results in a directory containing files such as:
result_ycbv.jsonresult_lmo.jsonresult_tless.json
You can override the detection directory explicitly:
python Pose_Estimation_Model/test_bop.py \
--config Pose_Estimation_Model/config/base.yaml \
--dataset ycbv \
--gpus 0 \
--detection_dir /path/to/detection_jsonsGenerated BOP csv files are saved under log/....
If you only want to verify that the repository works end-to-end on YCB-V as quickly as possible, use this order:
- Create and activate the environment:
conda env create -f environment.yaml
conda activate sinref6dIf mamba-ssm or causal-conv1d does not install cleanly during environment creation, install them manually before continuing.
- Build the PointNet++ extension:
export CUDA_HOME=/usr/local/cuda-11.8
cd Pose_Estimation_Model/model/pointnet2
python setup.py install
cd ../../../- Prepare these three directories:
Data/BOP/ycbv
Data/BOP-Templates/ycbv
Data/bop23_default_detections_for_task4/bop23_default_detections_for_task4/cnos-fastsam/result_ycbv.json
- Run YCB-V evaluation:
python Pose_Estimation_Model/test_bop.py \
--config Pose_Estimation_Model/config/base.yaml \
--dataset ycbv \
--gpus 0 \
--iter 2400000- Check the output csv:
log/pose_estimation_model_base_id0/ycbv_eval_iter2400000/result_ycbv-test.csv
If your detection jsons are stored somewhere else, pass:
--detection_dir /path/to/detection_jsonsPrepare a custom template directory first:
/path/to/custom_case/templates
Then run inference:
python Pose_Estimation_Model/run_inference_custom.py \
--output_dir /path/to/custom_case \
--rgb_path /path/to/rgb.png \
--depth_path /path/to/depth.png \
--cam_path /path/to/camera.json \
--seg_path /path/to/detections.json \
--gpus 0Optional:
--cad_path /path/to/model.plyIf --cad_path is omitted, the script falls back to template point clouds for radius estimation and visualization.
Outputs will be written to:
/path/to/custom_case/sam6d_results/detection_pem.json/path/to/custom_case/sam6d_results/vis_pem.png
Additional scripts are included for metric computation:
Pose_Estimation_Model/eval_lm_ADD-0.1d.pyPose_Estimation_Model/eval_ycbv_ADD(S).pyPose_Estimation_Model/eval_single_object_pose.py
These are command-line tools now. Use --help on each script for arguments.
- Clone the repository.
- Create the conda environment from
environment.yaml. - Build the
pointnet2extension. - Prepare the
Data/directory structure. - Download or prepare template files.
- Verify that
Pose_Estimation_Model/config/base.yamlpoints to the correct data locations. - Run
test_bop.pyon one dataset first, such asycbv. - Run training only after evaluation and data loading work correctly.
This usually means the object model directory or pre-rendered template directory was not found. Check:
Data/BOP/<dataset>/modelsData/BOP-Templates/<dataset>
The code now has a PyTorch fallback. It can run without knn_cuda, but may be slower.
These usually come from incompatible NumPy versions. The provided environment pins NumPy to the 1.24 series to avoid that issue.
Make sure:
- your PyTorch CUDA version matches your installed CUDA toolkit
nvccis available- your environment is activated before building extensions
If you find our work helpful, please consider citing:
@article{2026SinRef-6D,
author={Liu, Jian and Sun, Wei and Zeng, Kai and Zheng, Jin and Yang, Hui and Rahmani, Hossein and Mian, Ajmal and Wang, Lin},
title={Scalable Unseen Object 6-DoF Absolute Pose Estimation with Robotic Integration},
journal={IEEE Transactions on Robotics},
year={2026}
}Our implementation leverages the code from the repository below. We thank all for releasing their code.
This project is licensed under the terms of the MIT license.

