Skip to content

Support customizing trainer and daemon in VERL#407

Merged
ultmaster merged 4 commits intomainfrom
feature/verl-customize-trainer
Dec 12, 2025
Merged

Support customizing trainer and daemon in VERL#407
ultmaster merged 4 commits intomainfrom
feature/verl-customize-trainer

Conversation

@ultmaster
Copy link
Contributor

This PR is very experimental! The interface would probably only stay in v0.3.

@ultmaster
Copy link
Contributor Author

/ci

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

🚀 CI Watcher for correlation id-3644570756-mj27yj6v triggered by comment 3644570756
🏃‍♀️ Tracking 2 workflow run(s):

✅ All runs completed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for customizing the trainer and daemon classes in the VERL integration by introducing trainer_cls and daemon_cls parameters. The interface is experimental and planned only for v0.3.

Key changes:

  • Added daemon_cls parameter to AgentLightningTrainer.__init__ to allow custom daemon implementations
  • Added trainer_cls and daemon_cls parameters to run_ppo and TaskRunner.run functions
  • Updated VERL algorithm class to accept optional trainer_cls and daemon_cls parameters for customization

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
agentlightning/verl/trainer.py Added daemon_cls parameter to constructor and used it when instantiating the agent mode daemon
agentlightning/verl/entrypoint.py Added trainer_cls and daemon_cls parameters to main, run_ppo, and TaskRunner.run; updated imports and type hints
agentlightning/algorithm/verl/interface.py Added optional trainer_cls and daemon_cls parameters to VERL.__init__ and passed them through to run_ppo

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +45
trainer_cls=AgentLightningTrainer,
daemon_cls=AgentModeDaemon,
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AgentLightningTrainer and AgentModeDaemon classes are imported inside TYPE_CHECKING but are used at runtime in the main function. This will cause a NameError at runtime because these names are not available outside of type checking. Move these imports outside the TYPE_CHECKING block or use string literals for the default values.

Copilot uses AI. Check for mistakes.
store: LightningStore | None,
llm_proxy: LLMProxy | None,
adapter: TraceAdapter | None,
daemon_cls: Type[AgentModeDaemon],
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Type import is missing from the typing imports. The daemon_cls parameter in line 181 uses Type[AgentModeDaemon] as a type hint, but Type is not imported. Add Type to the imports from typing.

Copilot uses AI. Check for mistakes.
adapter: TraceAdapter | None,
daemon_cls: Type[AgentModeDaemon],
**kwargs,
):
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for the __init__ method is missing. The new daemon_cls parameter should be documented to help users understand its purpose and usage. Add a docstring that describes all parameters including the newly added daemon_cls.

Suggested change
):
):
"""
Initialize the AgentLightningTrainer.
Args:
store (LightningStore | None): The storage backend for logging and data persistence.
llm_proxy (LLMProxy | None): Proxy for interacting with the language model.
adapter (TraceAdapter | None): Adapter for converting traces to the required format.
daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon responsible for server communication and agent orchestration.
**kwargs: Additional keyword arguments passed to the base RayPPOTrainer.
"""

Copilot uses AI. Check for mistakes.
adapter: TraceAdapter[Any] | None,
trainer_cls: Type[AgentLightningTrainer],
daemon_cls: Type[AgentModeDaemon],
) -> None:
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run_ppo function has new parameters trainer_cls and daemon_cls but lacks a docstring to document what these parameters are for. Add a docstring that describes all parameters, especially the new ones, to help users understand their purpose and usage.

Suggested change
) -> None:
) -> None:
"""
Run the PPO (Proximal Policy Optimization) training loop using the provided configuration and components.
Parameters:
config (Any): The configuration object for the training run, typically loaded via Hydra.
train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided.
val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided.
store (LightningStore | None): The LightningStore instance for storing and retrieving data, or None.
llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None.
adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None.
trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the PPO trainer. This allows customization of the training logic by providing a different trainer class.
daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This allows customization of the agent's runtime behavior by providing a different daemon class.
Returns:
None
This function initializes Ray if necessary, then launches the PPO training process using the provided datasets,
store, LLM proxy, adapter, and customizable trainer and daemon classes.
"""

Copilot uses AI. Check for mistakes.
adapter: TraceAdapter[Any] | None,
daemon_cls: Type[AgentModeDaemon],
trainer_cls: Type[AgentLightningTrainer],
):
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TaskRunner.run method has new parameters daemon_cls and trainer_cls but lacks a docstring to document what these parameters are for. Add a docstring that describes all parameters, especially the new ones.

Suggested change
):
):
"""
Run the main training or evaluation task using the provided configuration and components.
Args:
config (Any): The configuration object for the experiment, typically an OmegaConf config.
train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided.
val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided.
store (LightningStore | None): The LightningStore instance for storing experiment data, or None.
llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None.
adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None.
daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This should be a subclass of AgentModeDaemon and is responsible for managing agent modes during training or evaluation.
trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the trainer. This should be a subclass of AgentLightningTrainer and is responsible for orchestrating the training or evaluation process.
Returns:
None
"""

Copilot uses AI. Check for mistakes.
Comment on lines +99 to +100
daemon_cls: Type[AgentModeDaemon],
trainer_cls: Type[AgentLightningTrainer],
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter order in TaskRunner.run is inconsistent with the parameter order in run_ppo. In run_ppo, the order is trainer_cls then daemon_cls (lines 56-57), but in TaskRunner.run, the order is reversed: daemon_cls then trainer_cls (lines 99-100). This inconsistency can lead to confusion and errors. Make the parameter order consistent across both functions.

Suggested change
daemon_cls: Type[AgentModeDaemon],
trainer_cls: Type[AgentLightningTrainer],
trainer_cls: Type[AgentLightningTrainer],
daemon_cls: Type[AgentModeDaemon],

Copilot uses AI. Check for mistakes.
@ultmaster
Copy link
Contributor Author

/ci

@github-actions
Copy link

github-actions bot commented Dec 12, 2025

🚀 CI Watcher for correlation id-3644845003-mj2d0efc triggered by comment 3644845003
🏃‍♀️ Tracking 3 workflow run(s):

✅ All runs completed.

@ultmaster ultmaster merged commit 1d199b2 into main Dec 12, 2025
30 checks passed
beanie00 added a commit to beanie00/agent-lightning that referenced this pull request Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants