This is the official repository for paper "VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL".
VisualSphinx is the largest fully-synthetic open-source dataset providing vision logic puzzles. It consists of over 660K automatically generated logical visual puzzles. Each logical puzzle is grounded with an interpretable rule and accompanied by both correct answers and plausible distractors.
- 🌐 Project Website - Learn more about VisualSphinx
- 📖 Technical Report - Discover the methodology and technical details behind VisualSphinx
- 🔧 Github Repo - Access the complete pipeline used to produce VisualSphinx-V1
- 🤗 HF Datasets - Find all VisualSphinx-V1 datasets
Build environment
git clone https://github.com/VisualSphinx/VisualSphinx-Generator.git
cd VisualSphinx-Generator
conda create -n VisualSphinx python=3.12 -y
conda activate VisualSphinx
pip install -r requirements.txt
Please go into pipeline for reproduce VisualSphinx. Please do not forget to define your API-Keys in api_config.py.
VisualSphinx is a comprehensive pipeline designed to generate large-scale, diverse, and verifiable synthetic datasets for vision logic puzzles. Key features include:
- Diverse Generation: Automatically produces high-quality visual logic puzzles from a variety of sources and rule templates, supporting multiple puzzle styles and formats.
- Self-Verification: Each puzzle is accompanied by correct answers and plausible distractors, with automated verification and scoring to ensure quality.
- Open & Reproducible: All code, prompts, and data processing steps are open-source and fully documented for reproducibility and community extension.
Please refer to verl for RL training using VisualSphinx datasets, which is based on .
License: Please follow MIT.
Contact: For questions, suggestions, or feedback, please reach out to Yichen, or raise an issue. We welcome your input and are committed to continuously improving VisualSphinx to better serve the community.
If you find the model, data, or code useful, please cite:
@misc{feng2025visualsphinx,
title={VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL},
author={Yichen Feng and Zhangchen Xu and Fengqing Jiang and Yuetai Li and Bhaskar Ramasubramanian and Luyao Niu and Bill Yuchen Lin and Radha Poovendran},
year={2025},
eprint={2505.23977},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.23977},
}

