🦁 VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

This is the official repository for paper "VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL".

VisualSphinx is the largest fully-synthetic open-source dataset providing vision logic puzzles. It consists of over 660K automatically generated logical visual puzzles. Each logical puzzle is grounded with an interpretable rule and accompanied by both correct answers and plausible distractors.

🌐 Project Website - Learn more about VisualSphinx
📖 Technical Report - Discover the methodology and technical details behind VisualSphinx
🔧 Github Repo - Access the complete pipeline used to produce VisualSphinx-V1
🤗 HF Datasets - Find all VisualSphinx-V1 datasets

Overview

Installation

Build environment

git clone https://github.com/VisualSphinx/VisualSphinx-Generator.git
cd VisualSphinx-Generator
conda create -n VisualSphinx python=3.12 -y
conda activate VisualSphinx
pip install -r requirements.txt

Generate Data

Please go into pipeline for reproduce VisualSphinx. Please do not forget to define your API-Keys in api_config.py.

Features

VisualSphinx is a comprehensive pipeline designed to generate large-scale, diverse, and verifiable synthetic datasets for vision logic puzzles. Key features include:

Diverse Generation: Automatically produces high-quality visual logic puzzles from a variety of sources and rule templates, supporting multiple puzzle styles and formats.
Self-Verification: Each puzzle is accompanied by correct answers and plausible distractors, with automated verification and scoring to ensure quality.
Open & Reproducible: All code, prompts, and data processing steps are open-source and fully documented for reproducibility and community extension.

Training

Please refer to verl for RL training using VisualSphinx datasets, which is based on .

Other Information

License: Please follow MIT.

Contact: For questions, suggestions, or feedback, please reach out to Yichen, or raise an issue. We welcome your input and are committed to continuously improving VisualSphinx to better serve the community.

Citation

If you find the model, data, or code useful, please cite:

@misc{feng2025visualsphinx,
      title={VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL}, 
      author={Yichen Feng and Zhangchen Xu and Fengqing Jiang and Yuetai Li and Bhaskar Ramasubramanian and Luyao Niu and Bill Yuchen Lin and Radha Poovendran},
      year={2025},
      eprint={2505.23977},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.23977}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
pipeline		pipeline
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦁 VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

Overview

Installation

Generate Data

Features

Training

Other Information

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦁 VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

Overview

Installation

Generate Data

Features

Training

Other Information

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages