Fix advantage computation + add advantage to metric logs #99

sdan · 2025-11-16T09:29:11Z

When a trajectory group is 1, centering advantage within that 1 group is: reward - reward = 0.

Fix is to instead use batch mean as baseline when our group size is 1

Also added advantage stats logging (mean/std/min/max)

Tiiiger · 2025-11-18T18:26:34Z

hi @sdan thank you for your contribution! but I don't think this is a common/standard enough technique to be merged into recipe. please fork if needed!

add adv

eeaf453

sdan force-pushed the add-advantage branch from a36fc96 to eeaf453 Compare November 16, 2025 09:31

Tiiiger closed this Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix advantage computation + add advantage to metric logs #99

Fix advantage computation + add advantage to metric logs #99

Uh oh!

sdan commented Nov 16, 2025

Uh oh!

Tiiiger commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix advantage computation + add advantage to metric logs #99

Fix advantage computation + add advantage to metric logs #99

Uh oh!

Conversation

sdan commented Nov 16, 2025

Uh oh!

Tiiiger commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants