SafeDriveRAG

Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation [Paper] [Dataset](Password: mz23)

Overview

In this work, we study how vision-language models (VLMs) can be utilized to enhance the safety for the autonomous driving system, including perception, situational understanding, and path planning. However, existing research has largely overlooked the evaluation of these models in traffic safety-critical driving scenarios. To bridge this gap, we create the benchmark (SafeDrive228K) and propose a new baseline based on VLM with knowledge graph-based retrieval-augmented generation (SafeDriveRAG) for visual question answering (VQA). Specifically, we introduce SafeDrive228K, the first large-scale multimodal question-answering benchmark comprising 228K examples across 18 sub-tasks. This benchmark encompasses a diverse range of traffic safety queries, from traffic accidents and corner cases to common safety knowledge, enabling a thorough assessment of the comprehension and reasoning abilities of the models. Furthermore, we propose a plug-and-play multimodal knowledge graph-based retrieval-augmented generation approach that employs a novel multi-scale subgraph retrieval algorithm for efficient information retrieval. By incorporating traffic safety guidelines collected from the Internet, this framework further enhances the model’s capacity to handle safety-critical situations. Finally, we conduct comprehensive evaluations on five mainstream VLMs to assess their reliability in safety-sensitive driving tasks. Experimental results demonstrate that integrating RAG significantly improves performance, achieving a +4.73% gain in Traffic Accidents tasks, +8.79% in Corner Cases tasks and +14.57% in Traffic Safety Commonsense across five mainstream VLMs, underscoring the potential of our proposed benchmark and methodology for advancing research in traffic safety.

Key Features

SafeDrive228K: A Large-Scale Multimodal QA Benchmark for Autonomous Driving Safety
- 9,331 real-world traffic accident videos
- Over 35,000 corner-case and safety-related images
- 228,000 QA pairs spanning traffic accidents, corner cases, and safety commonsense
Comprehensive Evaluation of VLMs in Traffic Safety
- The first benchmark systematically assessing model reasoning under diverse, safety-critical driving conditions
Novel Multimodal Knowledge Graph-based RAG Framework
- Unified indexing and retrieval for both textual and visual entities
- Efficient multi-scale subgraph retrieval tailored for real-time requirements
Plug-and-Play Enhancement for Mainstream Open-Source VLMs
Substantial Performance Gains in Safety-Critical Tasks
- Notable improvements in commonsense safety (+14.57%), corner cases (+8.79%), and accident scenarios (+4.73%) with RAG enhancement

Benchmark Overview

This benchmark is designed to evaluate model performance across diverse traffic safety scenarios. It consists of three major components: Traffic safety knowledge, Traffic accident, and Corner Case.

benchmark
├── Corner_Case
│   ├── annotations.json
│   ├── Coner_case_qa_new.json
│   └── img
├── Traffic_accident
│   ├── annotations
│   ├── img
│   └── Traffic_accident_qa_new.json
└── Traffic_safety_knowledge
    └── qa

Traffic Safety Knowledge

This folder primarily contains QA JSON files and the corresponding image folder. Each JSON file includes traffic-safety knowledge questions such as road rules, licenses, and vehicle types.

Example format:

{
  "id": 0,
  "question_type": "single-choice",
  "question": "The holder of a Class 6 operator's licence may operate which of the following vehicles?",
  "answer": ["OptionC"],
  "explain": "To drive a motorcycle, you must hold a Class 6 licence.",
  "img_path": [],
  "country": "Canada",
  "vehicle_type": "car",
  "optionA": "An ambulance",
  "optionB": "A bus",
  "optionC": "A motorcycle",
  "optionD": "A tractor-trailer",
  "optionE": "",
  "optionF": "",
  "language": "English"
}

These questions cover traffic rules and safety knowledge across multiple countries, ensuring a broad evaluation of model understanding.

Traffic Accident

This folder contains:

QA JSON files
Video folders (referenced via path fields)
Annotation folders

Each video is associated with a set of 11 questions covering different aspects of traffic accidents. The question types include single-choice, multiple-choice, and open-ended QA, designed to comprehensively assess model reasoning in accident scenarios.

Example format:

{
  "id": 0,
  "path": "1/001537",
  "extracted_json": [
    {
      "question": "What caused the accident in the video?",
      "options": [
        "A) Pedestrian is drunk",
        "B) Pedestrian moves or stays on the motorway",
        "C) Pedestrian does not notice the coming vehicles when crossing the street",
        "D) The vehicle hits the objects falling from the front vehicles"
      ],
      "answer": ["C"],
      "type": "single-choice"
    },
    ...
  ],
  ...
}

Corner Case

This folder includes:

QA JSON files
Annotation files
Corner case images

Each corner case corresponds to 7 questions, spanning single-choice, multiple-choice, and open-ended formats. These questions focus on identifying rare or complex road entities, targeting the robustness of models in handling rare or intricate traffic contexts.

Example format:

{
  "id": 0,
  "path": "img/images_0001.jpg",
  "extracted_json": [
    {
      "question": "What do you think the object at the bounding box [1265, 560, 40, 157] is?",
      "options": [
        "A) tricycle",
        "B) bollard",
        "C) machinery",
        "D) traffic_box"
      ],
      "answer": ["B"],
      "type": "single-choice"
    },
    ...
  ],
  ...
}

Acknowledgement

This benchmark builds upon several publicly available datasets. We sincerely thank the creators of IDKB , CODA-LM and CAP-DATA for making their resources available to the community.

Citation

If you use our dataset or benchmark in your research, please cite us as:

@article{ye2025safedriverag,
  title={SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation},
  author={Ye, Hao and Qi, Mengshi and Liu, Zhaohong and Liu, Liang and Ma, Huadong},
  journal={arXiv preprint arXiv:2507.21585},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
image		image
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SafeDriveRAG

Overview

Key Features

Benchmark Overview

Traffic Safety Knowledge

Traffic Accident

Corner Case

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SafeDriveRAG

Overview

Key Features

Benchmark Overview

Traffic Safety Knowledge

Traffic Accident

Corner Case

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages