Skip to content

Conversation

@xiaoyi1734
Copy link
Contributor

Description

  • Add KL-divergence and its variants to the standard PPO training process.

Related Issue

TODO

Check List

  • merge the latest version source branch/repo, and resolve all the conflicts
  • pass style check
  • pass all the tests

@PaParaZz1 PaParaZz1 added the algo Add new algorithm or improve old one label Jul 18, 2025
@PaParaZz1 PaParaZz1 merged commit 486bb30 into opendilab:main Jul 29, 2025
8 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

algo Add new algorithm or improve old one

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants