-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Fixes bug in GAE advantage estimation when gammalmbda
is a Tensor
#2773
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2773
Note: Links to docs will display an error until the docs builds have been completed. ❌ 13 New Failures, 2 Unrelated FailuresAs of commit 5411dd9 with merge base 75f113f ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
@@ -332,6 +332,8 @@ def vec_generalized_advantage_estimate( | |||
lmbda=lmbda, | |||
) | |||
|
|||
not_terminated = (~terminated).to(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment outside of the diff but ~15 lines below not_terminated
is redefined. Maybe worth removing it as part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, its actually the reason why I re-arranged some stuff (and forgot to remove the second occurrence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the occasion for some typos, sorry for the noise.
if isinstance(value, torch.Tensor) and value.numel() > 1: | ||
# create tensor while ensuring that gradients are passed | ||
not_done = (~done).to(dtype) | ||
gammalmbdas = not_done * value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exponential averaging weights were computed with done
which I believe is a mistake -- we want to use terminated
here.
@@ -332,6 +332,8 @@ def vec_generalized_advantage_estimate( | |||
lmbda=lmbda, | |||
) | |||
|
|||
not_terminated = (~terminated).to(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, its actually the reason why I re-arranged some stuff (and forgot to remove the second occurrence).
We have a few errors in test_cost.py :) |
Looking more closely at broken test, I'm actually now understanding how the previous approach was sound. Classic Chesterton's fence I guess 🙃 |
Gotcha, though it saddens me that people lose time making sense of things like that... That's bad UX/devex Anything we can do to avoid the confusion? Eg, more comments in the code etc? |
I guess here a comment would have gone a long way -- basically simply reminding that a single sequence can hold distinct trajectories, hence we need to make sure to cut the "eligibility traces" not only on |
Description
Describe your changes in detail.
Motivation and Context
The vectorial implementation of GAE was using
done
to compute the exponential averaging weights instead ofterminated
.Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!