Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration

This is the PyTorch implementation of paper SinRef-6D published in IEEE TRO by J. Liu, W. Sun, K. Zeng, J. Zheng, H. Yang, H. Rahmani, A. Mian, and L. Wang. SinRef-6D is a single reference view-based CAD model-free novel object 6D pose estimation method, which is simple yet effective and has strong scalability for practical applications.

Given a single RGB-D reference view of an unseen object in a default robot manipulation viewpoint, we aim to predict its 6-DoF absolute pose from any query view.

Real-World Demo

SinRef-6D deployment in real-world robotic manipulation scenarios. Notably, the reference view is not carefully selected. We select a default robot manipulation viewpoint (free of occlusion and with minimal self-occlusion) using an Intel RealSense L515 RGB-D camera as the reference view.

To the best of our knowledge, we are the first to present a method for novel object 6D absolute pose estimation using only a single reference view in real-world robotic manipulation scenarios. This approach simultaneously eliminates the need for object CAD models, dense reference views, and model retraining, offering enhanced efficiency and scalability while demonstrating strong generalization to potential real-world robotic applications.

More robotic demos can be seen at our Project Page.

SinRef-6D Repository

This repository contains:

Training code for the pose estimation model
BOP evaluation scripts
Custom-object inference scripts
CUDA/C++ extensions used by the model

1. Repository Structure

SinRef-6D
├── Pose_Estimation_Model/
│   ├── config/
│   ├── model/
│   ├── provider/
│   ├── utils/
│   ├── train.py
│   ├── test_bop.py
│   └── run_inference_custom.py
├── Data/
├── kernels/
├── dwconv/
└── environment.yaml

Main folders:

Pose_Estimation_Model/: core model, datasets, training, evaluation, and inference
Data/: expected dataset layout and example inputs
kernels/, dwconv/: low-level CUDA extensions used by the VMamba and point processing code

The model pipeline is:

Crop an observed object instance from RGB-D input.
Convert depth to an observed point cloud.
Load rendered templates for the target object.
Extract RGB-aligned features with VMamba and point features with PointMamba.
Match observed points to template points.
Recover the final pose with correspondence-based rigid alignment.

The main model entry is Pose_Estimation_Model/model/pose_estimation_model.py.

2. Environment Setup

The recommended environment is defined in environment.yaml.

This setup is intended for:

CUDA 11.8
Python 3.10
PyTorch 2.0.0

Create the conda environment

conda env create -f environment.yaml
conda activate sinref6d

Install CUDA extensions

After activating the environment, build the local extensions:

export CUDA_HOME=/usr/local/cuda-11.8
cd Pose_Estimation_Model/model/pointnet2
python setup.py install
cd ../../../

Optional extensions:

kernels/selective_scan/ is bundled in the repo and provides low-level kernels used by the VMamba stack
dwconv/ is also bundled and can be installed separately if you use that branch of the code

If knn_cuda is unavailable on your machine, the code now falls back to a pure PyTorch KNN implementation. It is slower, but useful for first-time setup and debugging.

If causal-conv1d needs to be built locally, you can also install it from the bundled source tree:

cd Pose_Estimation_Model/model/causal-conv1d
python setup.py install
cd ../../../

3. Pretrained Models

Download Pretrained Weights:

You can download the pretrained model weights from Google Drive.

The download includes:

SinRef-6D trained weights: Our trained pose estimation model weights
VMamba backbone weights: Pretrained VMamba backbone weights used in our pipeline

After downloading, place the weights in the appropriate directories as specified in the configuration files.

4. Data Preparation

Download Datasets:

Training Datasets (MegaPose-GSO & MegaPose-ShapeNetCore): Available at BOP Challenge 2023 Training Datasets
- MegaPose-GSO: Objects from Google Scanned Objects
- MegaPose-ShapeNetCore: Objects from ShapeNetCore
BOP Test Datasets: Available at BOP Benchmark Datasets
- Includes: YCB-V, LM-O, T-LESS, ITODD, HB, IC-BIN, TUD-L, etc.

The expected directory layout is:

Data
├── MegaPose-Training-Data
│   ├── MegaPose-GSO
│   └── MegaPose-ShapeNetCore
├── BOP
│   ├── ycbv
│   ├── lmo
│   ├── icbin
│   ├── itodd
│   ├── hb
│   ├── tudl
│   └── tless
└── BOP-Templates
    ├── ycbv
    ├── lmo
    ├── icbin
    ├── itodd
    ├── hb
    ├── tudl
    └── tless

By default, the config uses relative paths:

Data/MegaPose-Training-Data
Data/BOP
Data/BOP-Templates

If your datasets are stored outside the repo, the code will also try to resolve the same Data/... structure from a shared parent directory.

5. Template Files

Download BOP-Templates Dataset:

You can download the pre-rendered BOP-Templates dataset from Google Drive.

Expected template roots:

training templates:
- Data/MegaPose-Training-Data/MegaPose-GSO/templates
- Data/MegaPose-Training-Data/MegaPose-ShapeNetCore/templates
BOP test templates:
- Data/BOP-Templates/<dataset>

The training and BOP loaders expect pre-rendered RGB, mask, depth or XYZ files together with pose metadata in the layout already used by this repository.

6. Training

Use the base config:

python Pose_Estimation_Model/train.py \
  --config Pose_Estimation_Model/config/base.yaml \
  --model pose_estimation_model \
  --gpus 0

Common arguments:

--gpus: GPU ids, for example 0 or 0,1
--exp_id: experiment id used in the log directory name
--checkpoint_iter: resume from a saved iteration

Training outputs are written under:

log/<model>_<config>_id<exp_id>/

7. BOP Evaluation

Example:

python Pose_Estimation_Model/test_bop.py \
  --config Pose_Estimation_Model/config/base.yaml \
  --dataset ycbv \
  --gpus 0

The script expects detection results in a directory containing files such as:

result_ycbv.json
result_lmo.json
result_tless.json

You can override the detection directory explicitly:

python Pose_Estimation_Model/test_bop.py \
  --config Pose_Estimation_Model/config/base.yaml \
  --dataset ycbv \
  --gpus 0 \
  --detection_dir /path/to/detection_jsons

Generated BOP csv files are saved under log/....

Fastest YCBV Reproduction:

If you only want to verify that the repository works end-to-end on YCB-V as quickly as possible, use this order:

Create and activate the environment:

conda env create -f environment.yaml
conda activate sinref6d

If mamba-ssm or causal-conv1d does not install cleanly during environment creation, install them manually before continuing.

Build the PointNet++ extension:

export CUDA_HOME=/usr/local/cuda-11.8
cd Pose_Estimation_Model/model/pointnet2
python setup.py install
cd ../../../

Prepare these three directories:

Data/BOP/ycbv
Data/BOP-Templates/ycbv
Data/bop23_default_detections_for_task4/bop23_default_detections_for_task4/cnos-fastsam/result_ycbv.json

Run YCB-V evaluation:

python Pose_Estimation_Model/test_bop.py \
  --config Pose_Estimation_Model/config/base.yaml \
  --dataset ycbv \
  --gpus 0 \
  --iter 2400000

Check the output csv:

log/pose_estimation_model_base_id0/ycbv_eval_iter2400000/result_ycbv-test.csv

If your detection jsons are stored somewhere else, pass:

--detection_dir /path/to/detection_jsons

8. Custom Object Inference

Prepare a custom template directory first:

/path/to/custom_case/templates

Then run inference:

python Pose_Estimation_Model/run_inference_custom.py \
  --output_dir /path/to/custom_case \
  --rgb_path /path/to/rgb.png \
  --depth_path /path/to/depth.png \
  --cam_path /path/to/camera.json \
  --seg_path /path/to/detections.json \
  --gpus 0

Optional:

--cad_path /path/to/model.ply

If --cad_path is omitted, the script falls back to template point clouds for radius estimation and visualization.

Outputs will be written to:

/path/to/custom_case/sam6d_results/detection_pem.json
/path/to/custom_case/sam6d_results/vis_pem.png

9. Evaluation Utilities

Additional scripts are included for metric computation:

Pose_Estimation_Model/eval_lm_ADD-0.1d.py
Pose_Estimation_Model/eval_ycbv_ADD(S).py
Pose_Estimation_Model/eval_single_object_pose.py

These are command-line tools now. Use --help on each script for arguments.

10. Reproducibility Checklist and Common Issues

- For a fast first reproduction, follow this exact order:

Clone the repository.
Create the conda environment from environment.yaml.
Build the pointnet2 extension.
Prepare the Data/ directory structure.
Download or prepare template files.
Verify that Pose_Estimation_Model/config/base.yaml points to the correct data locations.
Run test_bop.py on one dataset first, such as ycbv.
Run training only after evaluation and data loading work correctly.

- Empty template list or `torch.stack` on an empty list

This usually means the object model directory or pre-rendered template directory was not found. Check:

Data/BOP/<dataset>/models
Data/BOP-Templates/<dataset>

- `knn_cuda` import failure

The code now has a PyTorch fallback. It can run without knn_cuda, but may be slower.

- `imgaug` or `h5py` binary compatibility errors

These usually come from incompatible NumPy versions. The provided environment pins NumPy to the 1.24 series to avoid that issue.

- CUDA extension build issues

Make sure:

your PyTorch CUDA version matches your installed CUDA toolkit
nvcc is available
your environment is activated before building extensions

Citation

If you find our work helpful, please consider citing:

@article{2026SinRef-6D,
  author={Liu, Jian and Sun, Wei and Zeng, Kai and Zheng, Jin and Yang, Hui and Rahmani, Hossein and Mian, Ajmal and Wang, Lin},
  title={Scalable Unseen Object 6-DoF Absolute Pose Estimation with Robotic Integration},
  journal={IEEE Transactions on Robotics},
  year={2026}
}

Acknowledgements

Our implementation leverages the code from the repository below. We thank all for releasing their code.

Licence

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Pose_Estimation_Model		Pose_Estimation_Model
dwconv		dwconv
image		image
kernels/selective_scan		kernels/selective_scan
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration

Real-World Demo

SinRef-6D Repository

1. Repository Structure

2. Environment Setup

Create the conda environment

Install CUDA extensions

3. Pretrained Models

4. Data Preparation

5. Template Files

6. Training

7. BOP Evaluation

Example:

Fastest YCBV Reproduction:

8. Custom Object Inference

9. Evaluation Utilities

10. Reproducibility Checklist and Common Issues

- For a fast first reproduction, follow this exact order:

- Empty template list or `torch.stack` on an empty list

- `knn_cuda` import failure

- `imgaug` or `h5py` binary compatibility errors

- CUDA extension build issues

Citation

Acknowledgements

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration

Real-World Demo

SinRef-6D Repository

1. Repository Structure

2. Environment Setup

Create the conda environment

Install CUDA extensions

3. Pretrained Models

4. Data Preparation

5. Template Files

6. Training

7. BOP Evaluation

Example:

Fastest YCBV Reproduction:

8. Custom Object Inference

9. Evaluation Utilities

10. Reproducibility Checklist and Common Issues

- For a fast first reproduction, follow this exact order:

- Empty template list or torch.stack on an empty list

- knn_cuda import failure

- imgaug or h5py binary compatibility errors

- CUDA extension build issues

Citation

Acknowledgements

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

- Empty template list or `torch.stack` on an empty list

- `knn_cuda` import failure

- `imgaug` or `h5py` binary compatibility errors

Packages