Skip to content

Maelic/SGG-Benchmark

Repository files navigation

Scene Graph Benchmark in Pytorch

LICENSE Python PyTorch arXiv

Previous work (PE-NET model) Our REACT model for Real-Time SGG
video_github_baseline.mp4
video_github_REACT.mp4
REACT++ versus previous work The REACT++ family
latency_vs_f1_psg react_pp_onnx_tradeoff

Very Quick Start 🚀

If you don't want to install the codebase, we provide a minimal running example with ONNX Runtime under demo/standalone_onnx_demo.py that you can run with a single command after downloading an onnx model from the MODEL_ZOO.md:

# You need to have CUDA and cudnn installed for GPU inference with the onnxruntime-gpu package.
pip install onnxruntime-gpu opencv-python numpy
python demo/standalone_onnx_demo.py \
    --onnx checkpoints/PSG/react++_yolo12m/model.onnx \
    --rel_conf 0.05 --box_conf 0.4

Quick Start 🚀

  1. Install
chmod +x scripts/install_uv.sh
./scripts/install_uv.sh
source .venv/bin/activate
  1. Pick a model from MODEL_ZOO.md and download it using 🤗 huggingface:
# Example: REACT++ PSG YOLOv12m (best accuracy/speed trade-off)
hf download maelic/REACTPlusPlus_PSG yolo12m/react_pp_yolo12m.onnx \
    --repo-type model --local-dir checkpoints/PSG/react++_yolo12m
  1. Run inference with a webcam demo
python demo/webcam_demo_onnx.py \
    --onnx  checkpoints/PSG/react++_yolo12m/yolo12m/react_pp_yolo12m.onnx \
    --rel_conf 0.05 --box_conf 0.4

Check the demo folder for more demos and details.

[NEW] FULL TUTORIAL 🚀

We now provide a full notebook tutorial on how to train/test/deploy your own SGG model! Please check it out:

TUTORIAL.ipynb

Recent Updates

  • 09/03/2026: REACT++ released! New YOLO12m-based model with improved accuracy and ONNX export for ~2× faster inference (13.4ms). See MODEL_ZOO.md for weights and results.
  • 09/03/2026: 🤗 As an effort to push the open-source community in SGG we are releasing the PSG, VG150 and IndoorVG datasets on the huggingface hub! Please see DATASET.md for more details.
  • 09/03/2026: The codebase now support YOLO26, the new YOLO release from ultralytics.
  • 15/08/2025: I have created a new tool to annotate your own SGG dataset with visual relationships, please check it out: SGG-Annotate. More info in ANNOTATIONS.md.
  • 31.07.2025: REACT has been accepted at the BMVC 2025 conference!
  • 26.05.2025: I have added some explanation for two new metrics: InformativeRecall@K and Recall@K Relative. InformativeRecall@K is defined in Mining Informativeness in Scene Graphs and can help to measure the pertinence and robustness of models for real-world applications. Please check the METRICS.md file for more information.
  • 26.05.2025: The codebase now supports also YOLOV12, see configs/hydra/VG/REACT++.yaml.
  • 04.12.2024: Official release of the REACT model weights for VG150, please see MODEL_ZOO.md
  • 03.12.2024: Official release of the REACT model
  • 23.05.2024: Added support for Hyperparameters Tuning with the RayTune library, please check it out: Hyperparameters Tuning
  • 23.05.2024: Added support for the YOLOV10 backbone and SQUAT relation head!
  • 28.05.2024: Official release of our Real-Time Scene Graph Generation implementation.
  • 23.05.2024: Added support for the YOLO-World backbone for Open-Vocabulary object detection!
  • 10.05.2024: Added support for the PSG Dataset
  • 03.04.2024: Added support for the IETrans method for data augmentation on the Visual Genome dataset, please check it out! IETrans.
  • 03.04.2024: Update the demo, now working with any models, check DEMO.md.
  • 01.04.2024: Added support for Wandb for better visualization during training, tutorial coming soon.

Contents

  1. Quick Start
  2. Full Tutorial Notebook
  3. Installation
  4. Datasets Preparation
  5. Model Zoo & Weights
  6. Supported Models & Backbones
  7. Metrics and Results
  8. Training Instructions
  9. Hyperparameters Tuning
  10. Evaluation Instructions
  11. Citations

Installation

Check INSTALL.md for installation instructions.

Datasets

Check DATASET.md for instructions regarding dataset preprocessing, including how to create your own dataset with SGG-Annotate.

Supported Models

Background

Scene Graph Generation approaches can be categorized between one-stage and two-stage approaches:

  1. Two-stages approaches are the original implementation of SGG. It decouples the training process into (1) training an object detection backbone and (2) using bounding box proposals and image features from the backbone to train a relation prediction model.
  2. One-stage approaches are learning both the object and relation features in the same learning stage. This codebase focuses only on the first category, two-stage approaches.

Object Detection Backbones

We proposed different object detection backbones that can be plugged with any relation prediction head, depending on the use case.

🚀 NEW! No need to train a backbone anymore, we support Yolo-World for fast and easy open-vocabulary inference. Please check it out!

  • YOLO26: New yolo architecture for SOTA real-time object detection.
  • YOLO12: New yolo architecture for SOTA real-time object detection.
  • YOLO11: New yolo version from Ultralytics for SOTA real-time object detection.
  • YOLOV10: New end-to-end yolo architecture for SOTA real-time object detection.
  • YOLOV8-World: SOTA in real-time open-vocabulary object detection!
  • YOLOV9: SOTA in real-time object detection.
  • YOLOV8: New yolo version from Ultralytics for SOTA real-time object detection.
  • LEGACY Faster-RCNN: This is the original backbone used in most SGG approaches. It is based on a ResNeXt-101 feature extractor and an RPN for regression and classification. See the original paper for reference. Performance is 38.52/26.35/28.14 mAp on VG train/val/test set respectively. You can find the original pretrained model by Kaihua here.

Relation Heads

We try to compiled the main approaches for relation modeling in this codebase:

Debiasing methods

On top of relation heads, several debiasing methods have been proposed through the years with the aim of increasing the accuracy of baseline models in the prediction of tail classes.

Data Augmentation methods

Due to severe biases in datasets, the task of Scene Graph Generation as also been tackled through data-centring approaches.

Model ZOO

We provide some of the pre-trained weights for evaluation or usage in downstream tasks, please see MODEL_ZOO.md.

Metrics and Results (IMPORTANT)

Explanation of metrics in our toolkit and reported results are given in METRICS.md

REACT++ Quick Start

REACT++ is our best model for real-time SGG, combining the YOLO12m detector with an efficient relation head. Pretrained weights and ONNX models are available in MODEL_ZOO.md.

Training REACT++ on PSG

python tools/relation_train_net_hydra.py --config-name PSG/REACT++ --task sgdet --save-best

Evaluating REACT++ (PyTorch)

python tools/relation_eval_hydra.py --run-dir checkpoints/PSG/react++_yolo12m --task sgdet

Exporting to ONNX

python tools/export_onnx.py --run-dir checkpoints/PSG/react++_yolo12m

Evaluating the ONNX model on PSG

This runs the full SGDet evaluation on the PSG test set using ONNX Runtime (GPU by default):

python tools/eval_onnx_psg.py --run-dir checkpoints/PSG/react++_yolo12m --provider CUDAExecutionProvider

Results are saved to checkpoints/PSG/react++_yolo12m/inference_onnx/onnx_eval_summary.json.

YOLOV8/9/10/11/12/World Pre-training

If you want to use YoloV8/9/10/11/12 or Yolo-World as a backbone instead of Faster-RCNN, you need to first train a model using the official ultralytics implementation. To help you with that, I have created a dedicated notebook to generate annotations in YOLO format from a .h5 file (SGG format). Once you have a model, you can modify a config file and change the path pretrained_detector_ckpt to your model weights. Please note that you will also need to change the variable yolo.size and yolo.out_channels accordingly if you use another variant of YOLO (nano, small or large for instance). For training an SGG model with YOLO as a backbone, you need to modify the meta_architecture variable in the same config file to GeneralizedYOLO. You can then follow the standard procedure for training below.

Faster R-CNN pre-training (legacy)

We do not support Faster-RCNN pre-training anymore.

Perform training on Scene Graph Generation

There are three standard protocols: (1) Predicate Classification (PredCls): taking ground truth bounding boxes and labels as inputs, (2) Scene Graph Classification (SGCls) : using ground truth bounding boxes without labels, (3) Scene Graph Detection (SGDet): detecting SGs from scratch. We use the argument --task to select the protocols.

For Predicate Classification (PredCls), we need to set:

--task predcls

For Scene Graph Classification (SGCls): ⚠️ SGCls mode is currently LEGACY and NOT SUPPORTED anymore for any YOLO-based model, please find the reason why in this issue.

--task sgcls

For Scene Graph Detection (SGDet):

--task sgdet

Predefined Models

We abstract various SGG models to be different relation-head predictors in the file roi_heads/relation_head/roi_relation_predictors.py. To select our predefined models, you can use MODEL.ROI_RELATION_HEAD.PREDICTOR.

For REACT++ Model:

model.roi_relation_head.predictor REACTPlusPlusPredictor

For REACT Model:

model.roi_relation_head.predictor  REACTPredictor

For PE-NET Model:

model.roi_relation_head.predictor PrototypeEmbeddingNetwork

For Neural-MOTIFS Model:

model.roi_relation_head.predictor  MotifPredictor

For Iterative-Message-Passing(IMP) Model (Note that SOLVER.BASE_LR should be changed to 0.001 in SGCls, or the model won't converge):

model.roi_relation_head.predictor  IMPPredictor

For VCTree Model:

model.roi_relation_head.predictor VCTreePredictor

For Transformer Model (Note that Transformer Model needs to change SOLVER.BASE_LR to 0.001, SOLVER.SCHEDULE.TYPE to WarmupMultiStepLR, SOLVER.MAX_ITER to 16000, SOLVER.IMS_PER_BATCH to 16, SOLVER.STEPS to (10000, 16000).), which is provided by Jiaxin Shi:

model.roi_relation_head.predictor TransformerPredictor

Examples of the Training Command

Recommended approach: Use the Hydra-based training script tools/relation_train_net_hydra.py with configs from configs/hydra/. See the REACT++ Quick Start section for an example.

Hyperparameters Tuning

Required library: pip install ray[data,train,tune] optuna tensorboard

We provide a training loop for hyperparameters tuning in hyper_param_tuning.py. This script uses the RayTune library for efficient hyperparameters search. You can define a search_space object with different values related to the optimizer (AdamW and SGD supported for now) or directly customize the model structure with model parameters (for instance Linear layers dimensions or MLP dimensions etc). The ASHAScheduler scheduler is used for the early stopping of bad trials. The default value to optimize is the overall loss but this can be customize to specific loss values or standard metrics such as mean_recall.

To launch the script, do as follow:

CUDA_VISIBLE_DEVICES=0 python tools/hyper_param_tuning.py --save-best --task sgdet --config-file "./configs/hydra/IndoorVG/REACT++.yaml"

The config and OUTPUT_DIR paths need to be absolute to allow faster loading. A lot of terminal outputs are disabled by default during tuning, using the cfg.VERBOSE variable.

To watch the results with tensorboardX:

tensorboard --logdir=./ray_results/train_relation_net_2024-06-23_15-28-01

Evaluation

Recommended Approach (Hydra-based)

For REACT++ and any model trained with the Hydra pipeline, evaluation is done with tools/relation_eval_hydra.py by pointing it at a checkpoint directory:

# SGDet evaluation (PSG)
python tools/relation_eval_hydra.py --run-dir checkpoints/PSG/react++_yolo12m --task sgdet

# SGDet evaluation with a specific checkpoint
python tools/relation_eval_hydra.py --run-dir checkpoints/PSG/react++_yolo12m --task sgdet --checkpoint best_model_epoch_9.pth

See the REACT++ Quick Start section for full training/eval/ONNX export commands.

Citations

If you find this project helps your research, please kindly consider citing our project or papers in your publications.

@misc{neau2026reactplusplus,
      title={REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation
}, 
      author={Maëlic Neau and Zoe Falomir},
      year={2026},
      eprint={2603.06386},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.06386}, 
}
@misc{neau2024reactrealtimeefficiencyaccuracy,
      title={REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation}, 
      author={Maëlic Neau and Paulo E. Santos and Anne-Gwenn Bosser and Cédric Buche},
      year={2024},
      eprint={2405.16116},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2405.16116}, 
}

About

A New Benchmark for Scene Graph Generation, targeting real-world applications

Resources

License

Stars

Watchers

Forks

Packages