verl_example

Usage Documentation under the Verl Framework

multiturn_grpo.yaml and multiturn_megatron_grpo.yaml are configuration files under the Verl framework, corresponding to the standard version and Megatron version of the GRPO model respectively.
multiturn_llm_reward.py is a custom reward model file under the Verl framework used for calculating reward values.

Usage instructions:

Ensure that the Verl framework and related dependencies are installed.
Point the model path in the multiturn_grpo.yaml or multiturn_megatron_grpo.yaml configuration file to the pretrained GRPO model.
Point the reward model path in the multiturn_llm_reward.py configuration file to this file.
Run the configuration file using the Verl framework to start training and evaluating the model.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
README_ZH.md		README_ZH.md
multiturn_grpo.yaml		multiturn_grpo.yaml
multiturn_llm_reward.py		multiturn_llm_reward.py
multiturn_megatron_grpo.yaml		multiturn_megatron_grpo.yaml