Skip to content

Fix PPORecurrent training issue: tuned learning rate and added missing max_grad_norm#4

Draft
emiliof114 wants to merge 1 commit intofei-yang-wu:mainfrom
emiliof114:fix/lr-tuning
Draft

Fix PPORecurrent training issue: tuned learning rate and added missing max_grad_norm#4
emiliof114 wants to merge 1 commit intofei-yang-wu:mainfrom
emiliof114:fix/lr-tuning

Conversation

@emiliof114
Copy link
Copy Markdown

This PR fixes the failed PPORecurrent training test by stabilizing optimization.

Changes:

  • Added max_grad_norm parameter to optimizer config.

  • Tuned learning rate from 3e-41e-4 for better convergence.

  • Verified that all tests pass locally (6/6).

Result:
PPORecurrent training now improves average return as expected.

@fei-yang-wu fei-yang-wu marked this pull request as draft March 19, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant