The official repo for: SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans.
This repo provides everything needed to reproduce our results:
- Section 1 — Environment setup (conda + pip)
- Section 2 — Configuration (
.envfor paths and API keys) - Section 3 — Data download (from HuggingFace or local parquet)
- Section 4 — Evaluation, SFT training, and RL training scripts
| Type | Name |
|---|---|
| Model | hzeng/syn-plan-research-4B |
| Model | hzeng/syn-plan-research-4B-sft |
| Model | hzeng/syn-plan-research-8B |
| Model | hzeng/syn-plan-research-8B-sft |
| Dataset | hzeng/syn-plan-research-data-eval |
| Dataset | hzeng/syn-plan-research-data-sft |
| Dataset | hzeng/syn-plan-research-data-rl |
Prerequisites: Linux, NVIDIA A100 (or compatible GPU), Conda.
conda create -n verl-vllm083 python=3.10 -y
conda activate verl-vllm083pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124conda install -y -c nvidia/label/cuda-12.4.1 cuda-toolkit
nvcc --version # verify: release 12.4pip install vllm==0.8.3pip install flash-attn==2.7.3 --no-build-isolationpip install transformers==4.52.4pip install accelerate==1.12.0 \
datasets==4.5.0 \
ray==2.53.0 \
peft==0.18.1 \
wandb==0.24.1 \
hydra-core==1.3.2 \
xformers==0.0.29.post2 \
numpy==1.26.4 \
pandas==2.3.3 \
tensordict==0.10.0 \
uvicorn==0.40.0 \
fastapi==0.128.0 \
einops==0.8.2cd syn_plan_research/verl
pip install --no-deps -e .
⚠️ Use--no-depsto avoid overwriting the pinned dependencies above.
python -m playwright install chromiumAll scripts read paths and API keys from a single .env file at the repository root.
cp .env.example .envThen edit .env:
# Where to store data, checkpoints, caches, and eval outputs.
# This can be a separate high-capacity filesystem from your code directory.
DATA_ROOT="/path/to/your/data/storage"
# Serper API key for web search (get one at https://serper.dev/)
SERPER_API_KEY="your_serper_api_key_here"After setting DATA_ROOT, the scripts will use the following directory layout automatically:
${DATA_ROOT}/
├── .cache/huggingface/ # HuggingFace cache (HF_HOME)
├── data/
│ ├── sft/train.parquet # SFT training data
│ ├── rl/train.parquet # RL training data
│ └── eval/validation.parquet # Evaluation data
├── cache/
│ └── serper_search_cache.jsonl # Web search cache
├── checkpoints/
│ ├── sft/... # SFT checkpoints
│ └── rl/... # RL checkpoints
└── eval_outputs/ # Evaluation results
Note: Each shell script also contains a
conda activateline. If your conda setup differs, update thesourceandconda activatelines in the scripts to match your environment.
Download to local parquet files.
cd syn_plan_research/verl
bash examples/syn_plan_research/download_parquets_to_local.shThis downloads all datasets to ${DATA_ROOT}/data/.
We provide a pre-built Serper search cache that contains retrieval results from previously used queries. Using this cache can significantly reduce your Serper API usage and cost.
Download serper_search_cache.jsonl from Google Drive: link
Then place it at:
mkdir -p ${DATA_ROOT}/cache
mv serper_search_cache.jsonl ${DATA_ROOT}/cache/All commands should be run from the syn_plan_research/verl directory:
cd syn_plan_research/verlbash examples/syn_plan_research/eval_syn_plan_research_all.shThis runs:
- Pass@1 evaluation on all data sources
- Pass@4 evaluation on GAIA
Results are saved to ${DATA_ROOT}/eval_outputs/.
See syn_plan_research/verl/examples/syn_plan_research/README.md for detailed configuration options (model paths, sampling parameters, pass@k settings, etc.).
bash examples/syn_plan_research/sft_syn_plan_research.shbash examples/syn_plan_research/rl_syn_plan_research.sh