Skip to content

VisualSphinx/VisualSphinx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦁 VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL

This is the official repository for paper "VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL".

VisualSphinx is the largest fully-synthetic open-source dataset providing vision logic puzzles. It consists of over 660K automatically generated logical visual puzzles. Each logical puzzle is grounded with an interpretable rule and accompanied by both correct answers and plausible distractors.

Overview

VisualSphinx performance

Installation

Build environment

git clone https://github.com/VisualSphinx/VisualSphinx-Generator.git
cd VisualSphinx-Generator
conda create -n VisualSphinx python=3.12 -y
conda activate VisualSphinx
pip install -r requirements.txt

Generate Data

Please go into pipeline for reproduce VisualSphinx. Please do not forget to define your API-Keys in api_config.py.

Features

VisualSphinx is a comprehensive pipeline designed to generate large-scale, diverse, and verifiable synthetic datasets for vision logic puzzles. Key features include:

  • Diverse Generation: Automatically produces high-quality visual logic puzzles from a variety of sources and rule templates, supporting multiple puzzle styles and formats.
  • Self-Verification: Each puzzle is accompanied by correct answers and plausible distractors, with automated verification and scoring to ensure quality.
  • Open & Reproducible: All code, prompts, and data processing steps are open-source and fully documented for reproducibility and community extension.

Training

Please refer to verl for RL training using VisualSphinx datasets, which is based on .

Other Information

License: Please follow MIT.

Contact: For questions, suggestions, or feedback, please reach out to Yichen, or raise an issue. We welcome your input and are committed to continuously improving VisualSphinx to better serve the community.

Citation

If you find the model, data, or code useful, please cite:

@misc{feng2025visualsphinx,
      title={VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL}, 
      author={Yichen Feng and Zhangchen Xu and Fengqing Jiang and Yuetai Li and Bhaskar Ramasubramanian and Luyao Niu and Bill Yuchen Lin and Radha Poovendran},
      year={2025},
      eprint={2505.23977},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.23977}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors