SGFormer:
Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

Changsheng Lv¹, Mengshi Qi¹, Xia Li¹, Zhengyuan Yang², Huadong Ma¹

¹ Beijing University of Posts and Telecommunications ² University of Rochester

3D scene graph generation aims to parse a 3D scene into a structured graph of objects and their relationships. While recent methods leverage point clouds as input, they often overlook semantic richness and struggle to model long-range relational dependencies. To bridge this gap, we propose SGFormer — a novel Semantic Graph Transformer that injects enriched textual semantics (e.g., LLM-enhanced object descriptions) into a dual-layer architecture: a Graph Embedding Layer for structural reasoning and a Semantic Injection Layer for knowledge-aware message passing. SGFormer achieves state-of-the-art performance on the 3DSSG-O27R16 benchmark.

Release

2024-02-15 🚀 SGFormer paper accepted by AAAI 2024!
2024-01-10 💾 Code and model release for SGFormer!

3DSSG-O27R16 Dataset

Overview: We adopt the cleaned 3DSSG-O27R16 dataset introduced by SGGpoint (CVPR 2021), which enhances the original 3DSSG with:

Dense 10-dim point clouds (XYZ + RGB + normal + instance ID)
Full-scene graphs (not subgraphs)
27 object classes (O27) and 16 structural relationship types (R16)
Removal of low-quality scans and comparative relations (e.g., more-comfortable-than)
Multi-class edge labeling (instead of multi-label)

🔍 For dataset download and preprocessing details, please visit the SGGpoint dataset page.

Results

Evaluation Setup: We evaluate SGFormer on the 3DSSG-O27R16 validation set using standard scene-graph metrics: Recall@50 for node classification and Mean Recall@50 for edge (relationship) prediction.

✨ SGFormer outperforms prior arts by a clear margin, especially in relationship understanding, thanks to its semantic-aware transformer design.

Run Your Own Evaluation

Dataset

Follow the instructions at SGGpoint Dataset Guide to obtain 3DSSG-O27R16. Place the data under data/3DSSG/.

Installation

conda create --name sgformer python=3.8
conda activate sgformer

git clone https://github.com/yourname/SGFormer.git
cd SGFormer

pip install -r requirements.txt

Training

CUDA_VISIBLE_DEVICES=5 python -m main --mode train --config /home/lcs/tpami2025/config/SGFormer.json --exp exp_76_test \
        --model_name Mmgnet --continue_learning_mode none --root /home/lcs/tpami2025/data/3DSSG_subset \
        --dataset_annotation_type 160O26R \
        --obj_label_path /home/lcs/tpami2025/data/3DSSG_subset/classes.txt \
        --rel_label_path /home/lcs/tpami2025/data/3DSSG_subset/relationships.txt \
        --num_workers 8 --task_type PredCls

Inference

CUDA_VISIBLE_DEVICES=3 python inference.py --config /home/lcs/tpami2025/config/SGFormer.json --exp exp_66 \
--model_name SGFormer --CKPT_PATH /data_3/lcs/tpami2025/workdir --num_workers 8 --root /home/lcs/tpami2025/data/3DSSG_subset --inference_num 67 \
--obj_label_path /home/lcs/tpami2025/data/3DSSG_subset/classes.txt --rel_label_path /home/lcs/tpami2025/data/3DSSG_subset/relationships.txt \
--use_VLM_description --use_triplet --dataset_annotation_type 160O26R

Acknowledgement

Our evaluation code is build upon VL-SAT. We acknowledge their team for providing this excellent toolkit for evaluating multimodal large language models.

Citation

If you find our paper and code useful in your research, please consider giving us a star ⭐ and citing our work 📝 :)

@inproceedings{lv2024sgformer,
  title={SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation},
  author={Lv, Changsheng and Qi, Mengshi and Li, Xia and Yang, Zhengyuan and Ma, Huadong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={5},
  pages={4035--4043},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
preprocessing		preprocessing
src		src
README.md		README.md
SGFormer.py		SGFormer.py
inference.py		inference.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGFormer:
Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

Release

Contents

3DSSG-O27R16 Dataset

Results

Run Your Own Evaluation

Dataset

Installation

Training

Inference

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

Release

Contents

3DSSG-O27R16 Dataset

Results

Run Your Own Evaluation

Dataset

Installation

Training

Inference

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

SGFormer:
Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

Packages