Support customizing trainer and daemon in VERL by ultmaster · Pull Request #407 · microsoft/agent-lightning

ultmaster · 2025-12-12T01:56:43Z

This PR is very experimental! The interface would probably only stay in v0.3.

ultmaster · 2025-12-12T01:59:23Z

/ci

github-actions · 2025-12-12T01:59:36Z

🚀 CI Watcher for correlation id-3644570756-mj27yj6v triggered by comment 3644570756
🏃‍♀️ Tracking 2 workflow run(s):

🟢 Spider - PR #407 - ci-spider - id-3644570756-mj27yj6v — completed/success
🔴 Calc-X - PR #407 - ci-calc-x - id-3644570756-mj27yj6v — completed/failure

✅ All runs completed.

Copilot

Pull request overview

This PR adds support for customizing the trainer and daemon classes in the VERL integration by introducing trainer_cls and daemon_cls parameters. The interface is experimental and planned only for v0.3.

Key changes:

Added daemon_cls parameter to AgentLightningTrainer.__init__ to allow custom daemon implementations
Added trainer_cls and daemon_cls parameters to run_ppo and TaskRunner.run functions
Updated VERL algorithm class to accept optional trainer_cls and daemon_cls parameters for customization

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
agentlightning/verl/trainer.py	Added `daemon_cls` parameter to constructor and used it when instantiating the agent mode daemon
agentlightning/verl/entrypoint.py	Added `trainer_cls` and `daemon_cls` parameters to `main`, `run_ppo`, and `TaskRunner.run`; updated imports and type hints
agentlightning/algorithm/verl/interface.py	Added optional `trainer_cls` and `daemon_cls` parameters to `VERL.__init__` and passed them through to `run_ppo`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T02:00:31Z

agentlightning/verl/entrypoint.py

+        trainer_cls=AgentLightningTrainer,
+        daemon_cls=AgentModeDaemon,


The AgentLightningTrainer and AgentModeDaemon classes are imported inside TYPE_CHECKING but are used at runtime in the main function. This will cause a NameError at runtime because these names are not available outside of type checking. Move these imports outside the TYPE_CHECKING block or use string literals for the default values.

Copilot · 2025-12-12T02:00:31Z

agentlightning/verl/trainer.py

+        store: LightningStore | None,
+        llm_proxy: LLMProxy | None,
+        adapter: TraceAdapter | None,
+        daemon_cls: Type[AgentModeDaemon],


The Type import is missing from the typing imports. The daemon_cls parameter in line 181 uses Type[AgentModeDaemon] as a type hint, but Type is not imported. Add Type to the imports from typing.

Copilot · 2025-12-12T02:00:31Z

agentlightning/verl/trainer.py

+        adapter: TraceAdapter | None,
+        daemon_cls: Type[AgentModeDaemon],
+        **kwargs,
    ):


The docstring for the __init__ method is missing. The new daemon_cls parameter should be documented to help users understand its purpose and usage. Add a docstring that describes all parameters including the newly added daemon_cls.

Suggested change

):

):

"""

Initialize the AgentLightningTrainer.

Args:

store (LightningStore | None): The storage backend for logging and data persistence.

llm_proxy (LLMProxy | None): Proxy for interacting with the language model.

adapter (TraceAdapter | None): Adapter for converting traces to the required format.

daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon responsible for server communication and agent orchestration.

**kwargs: Additional keyword arguments passed to the base RayPPOTrainer.

"""

Copilot · 2025-12-12T02:00:32Z

agentlightning/verl/entrypoint.py

    adapter: TraceAdapter[Any] | None,
+    trainer_cls: Type[AgentLightningTrainer],
+    daemon_cls: Type[AgentModeDaemon],
 ) -> None:


The run_ppo function has new parameters trainer_cls and daemon_cls but lacks a docstring to document what these parameters are for. Add a docstring that describes all parameters, especially the new ones, to help users understand their purpose and usage.

Suggested change

) -> None:

) -> None:

"""

Run the PPO (Proximal Policy Optimization) training loop using the provided configuration and components.

Parameters:

config (Any): The configuration object for the training run, typically loaded via Hydra.

train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided.

val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided.

store (LightningStore | None): The LightningStore instance for storing and retrieving data, or None.

llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None.

adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None.

trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the PPO trainer. This allows customization of the training logic by providing a different trainer class.

daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This allows customization of the agent's runtime behavior by providing a different daemon class.

Returns:

None

This function initializes Ray if necessary, then launches the PPO training process using the provided datasets,

store, LLM proxy, adapter, and customizable trainer and daemon classes.

"""

Copilot · 2025-12-12T02:00:32Z

agentlightning/verl/entrypoint.py

+        adapter: TraceAdapter[Any] | None,
+        daemon_cls: Type[AgentModeDaemon],
+        trainer_cls: Type[AgentLightningTrainer],
    ):


The TaskRunner.run method has new parameters daemon_cls and trainer_cls but lacks a docstring to document what these parameters are for. Add a docstring that describes all parameters, especially the new ones.

Suggested change

):

):

"""

Run the main training or evaluation task using the provided configuration and components.

Args:

config (Any): The configuration object for the experiment, typically an OmegaConf config.

train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided.

val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided.

store (LightningStore | None): The LightningStore instance for storing experiment data, or None.

llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None.

adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None.

daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This should be a subclass of AgentModeDaemon and is responsible for managing agent modes during training or evaluation.

trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the trainer. This should be a subclass of AgentLightningTrainer and is responsible for orchestrating the training or evaluation process.

Returns:

None

"""

Copilot · 2025-12-12T02:00:32Z

agentlightning/verl/entrypoint.py

+        daemon_cls: Type[AgentModeDaemon],
+        trainer_cls: Type[AgentLightningTrainer],


The parameter order in TaskRunner.run is inconsistent with the parameter order in run_ppo. In run_ppo, the order is trainer_cls then daemon_cls (lines 56-57), but in TaskRunner.run, the order is reversed: daemon_cls then trainer_cls (lines 99-100). This inconsistency can lead to confusion and errors. Make the parameter order consistent across both functions.

Suggested change

daemon_cls: Type[AgentModeDaemon],

trainer_cls: Type[AgentLightningTrainer],

trainer_cls: Type[AgentLightningTrainer],

daemon_cls: Type[AgentModeDaemon],

ultmaster · 2025-12-12T04:20:49Z

/ci

github-actions · 2025-12-12T04:21:01Z

🚀 CI Watcher for correlation id-3644845003-mj2d0efc triggered by comment 3644845003
🏃‍♀️ Tracking 3 workflow run(s):

🟢 Spider - PR #407 - ci-spider - id-3644845003-mj2d0efc — completed/success
🟢 Calc-X - PR #407 - ci-calc-x - id-3644845003-mj2d0efc — completed/success
🟢 GPU Test - PR #407 - ci-gpu - id-3644845003-mj2d0efc — completed/success

✅ All runs completed.

ultmaster added 2 commits December 12, 2025 09:53

support

4760e78

missing annotation flag

8aedda6

Copilot AI review requested due to automatic review settings December 12, 2025 01:56

Copilot started reviewing on behalf of ultmaster December 12, 2025 01:57 View session

ultmaster added ci-spider ci-calc-x labels Dec 12, 2025

Copilot AI reviewed Dec 12, 2025

View reviewed changes

ultmaster added 2 commits December 12, 2025 10:22

resolve comments

9c98c58

update imports

17abbfc

ultmaster added the ci-gpu label Dec 12, 2025

ultmaster merged commit 1d199b2 into main Dec 12, 2025
30 checks passed

beanie00 added a commit to beanie00/agent-lightning that referenced this pull request Dec 12, 2025

link with PR microsoft#407

e119c68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support customizing trainer and daemon in VERL#407

Support customizing trainer and daemon in VERL#407
ultmaster merged 4 commits intomainfrom
feature/verl-customize-trainer

ultmaster commented Dec 12, 2025

Uh oh!

ultmaster commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

ultmaster commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		trainer_cls=AgentLightningTrainer,
		daemon_cls=AgentModeDaemon,

-    ):
+    ):
+        """
+        Initialize the AgentLightningTrainer.
+        Args:
+            store (LightningStore | None): The storage backend for logging and data persistence.
+            llm_proxy (LLMProxy | None): Proxy for interacting with the language model.
+            adapter (TraceAdapter | None): Adapter for converting traces to the required format.
+            daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon responsible for server communication and agent orchestration.
+            **kwargs: Additional keyword arguments passed to the base RayPPOTrainer.
+        """

-) -> None:
+) -> None:
+    """
+    Run the PPO (Proximal Policy Optimization) training loop using the provided configuration and components.
+    Parameters:
+        config (Any): The configuration object for the training run, typically loaded via Hydra.
+        train_dataset (Dataset[Any] | None): The training dataset to use, or None if not provided.
+        val_dataset (Dataset[Any] | None): The validation dataset to use, or None if not provided.
+        store (LightningStore | None): The LightningStore instance for storing and retrieving data, or None.
+        llm_proxy (LLMProxy | None): The LLMProxy instance for model inference, or None.
+        adapter (TraceAdapter[Any] | None): The TraceAdapter for logging or tracing, or None.
+        trainer_cls (Type[AgentLightningTrainer]): The class to use for creating the PPO trainer. This allows customization of the training logic by providing a different trainer class.
+        daemon_cls (Type[AgentModeDaemon]): The class to use for creating the agent mode daemon. This allows customization of the agent's runtime behavior by providing a different daemon class.
+    Returns:
+        None
+    This function initializes Ray if necessary, then launches the PPO training process using the provided datasets,
+    store, LLM proxy, adapter, and customizable trainer and daemon classes.
+    """

		daemon_cls: Type[AgentModeDaemon],
		trainer_cls: Type[AgentLightningTrainer],

Conversation

ultmaster commented Dec 12, 2025

Uh oh!

ultmaster commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 12, 2025 •

edited

Loading

github-actions bot commented Dec 12, 2025 •

edited

Loading