Support customizing trainer and daemon in VERL#407
Conversation
|
/ci |
|
🚀 CI Watcher for correlation id-3644570756-mj27yj6v triggered by comment 3644570756
✅ All runs completed. |
There was a problem hiding this comment.
Pull request overview
This PR adds support for customizing the trainer and daemon classes in the VERL integration by introducing trainer_cls and daemon_cls parameters. The interface is experimental and planned only for v0.3.
Key changes:
- Added
daemon_clsparameter toAgentLightningTrainer.__init__to allow custom daemon implementations - Added
trainer_clsanddaemon_clsparameters torun_ppoandTaskRunner.runfunctions - Updated
VERLalgorithm class to accept optionaltrainer_clsanddaemon_clsparameters for customization
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| agentlightning/verl/trainer.py | Added daemon_cls parameter to constructor and used it when instantiating the agent mode daemon |
| agentlightning/verl/entrypoint.py | Added trainer_cls and daemon_cls parameters to main, run_ppo, and TaskRunner.run; updated imports and type hints |
| agentlightning/algorithm/verl/interface.py | Added optional trainer_cls and daemon_cls parameters to VERL.__init__ and passed them through to run_ppo |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| trainer_cls=AgentLightningTrainer, | ||
| daemon_cls=AgentModeDaemon, |
There was a problem hiding this comment.
The AgentLightningTrainer and AgentModeDaemon classes are imported inside TYPE_CHECKING but are used at runtime in the main function. This will cause a NameError at runtime because these names are not available outside of type checking. Move these imports outside the TYPE_CHECKING block or use string literals for the default values.
| store: LightningStore | None, | ||
| llm_proxy: LLMProxy | None, | ||
| adapter: TraceAdapter | None, | ||
| daemon_cls: Type[AgentModeDaemon], |
There was a problem hiding this comment.
The Type import is missing from the typing imports. The daemon_cls parameter in line 181 uses Type[AgentModeDaemon] as a type hint, but Type is not imported. Add Type to the imports from typing.
| adapter: TraceAdapter | None, | ||
| daemon_cls: Type[AgentModeDaemon], | ||
| **kwargs, | ||
| ): |
There was a problem hiding this comment.
The docstring for the __init__ method is missing. The new daemon_cls parameter should be documented to help users understand its purpose and usage. Add a docstring that describes all parameters including the newly added daemon_cls.
| ): | |
| ): | |
| """ | |
| Initialize the AgentLightningTrainer. | |
| Args: | |
| store (LightningStore | None): The storage backend for logging and data persistence. | |
| llm_proxy (LLMProxy | None): Proxy for interacting with the language model. | |
| adapter (TraceAdapter | None): Adapter for converting traces to the required format. | |
| daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon responsible for server communication and agent orchestration. | |
| **kwargs: Additional keyword arguments passed to the base RayPPOTrainer. | |
| """ |
| adapter: TraceAdapter[Any] | None, | ||
| trainer_cls: Type[AgentLightningTrainer], | ||
| daemon_cls: Type[AgentModeDaemon], | ||
| ) -> None: |
There was a problem hiding this comment.
The run_ppo function has new parameters trainer_cls and daemon_cls but lacks a docstring to document what these parameters are for. Add a docstring that describes all parameters, especially the new ones, to help users understand their purpose and usage.
| ) -> None: | |
| ) -> None: | |
| """ | |
| Run the PPO (Proximal Policy Optimization) training loop using the provided configuration and components. | |
| Parameters: | |
| config (Any): The configuration object for the training run, typically loaded via Hydra. | |
| train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided. | |
| val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided. | |
| store (LightningStore | None): The LightningStore instance for storing and retrieving data, or None. | |
| llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None. | |
| adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None. | |
| trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the PPO trainer. This allows customization of the training logic by providing a different trainer class. | |
| daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This allows customization of the agent's runtime behavior by providing a different daemon class. | |
| Returns: | |
| None | |
| This function initializes Ray if necessary, then launches the PPO training process using the provided datasets, | |
| store, LLM proxy, adapter, and customizable trainer and daemon classes. | |
| """ |
| adapter: TraceAdapter[Any] | None, | ||
| daemon_cls: Type[AgentModeDaemon], | ||
| trainer_cls: Type[AgentLightningTrainer], | ||
| ): |
There was a problem hiding this comment.
The TaskRunner.run method has new parameters daemon_cls and trainer_cls but lacks a docstring to document what these parameters are for. Add a docstring that describes all parameters, especially the new ones.
| ): | |
| ): | |
| """ | |
| Run the main training or evaluation task using the provided configuration and components. | |
| Args: | |
| config (Any): The configuration object for the experiment, typically an OmegaConf config. | |
| train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided. | |
| val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided. | |
| store (LightningStore | None): The LightningStore instance for storing experiment data, or None. | |
| llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None. | |
| adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None. | |
| daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This should be a subclass of AgentModeDaemon and is responsible for managing agent modes during training or evaluation. | |
| trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the trainer. This should be a subclass of AgentLightningTrainer and is responsible for orchestrating the training or evaluation process. | |
| Returns: | |
| None | |
| """ |
agentlightning/verl/entrypoint.py
Outdated
| daemon_cls: Type[AgentModeDaemon], | ||
| trainer_cls: Type[AgentLightningTrainer], |
There was a problem hiding this comment.
The parameter order in TaskRunner.run is inconsistent with the parameter order in run_ppo. In run_ppo, the order is trainer_cls then daemon_cls (lines 56-57), but in TaskRunner.run, the order is reversed: daemon_cls then trainer_cls (lines 99-100). This inconsistency can lead to confusion and errors. Make the parameter order consistent across both functions.
| daemon_cls: Type[AgentModeDaemon], | |
| trainer_cls: Type[AgentLightningTrainer], | |
| trainer_cls: Type[AgentLightningTrainer], | |
| daemon_cls: Type[AgentModeDaemon], |
|
/ci |
|
🚀 CI Watcher for correlation id-3644845003-mj2d0efc triggered by comment 3644845003
✅ All runs completed. |
This PR is very experimental! The interface would probably only stay in v0.3.