-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch: don't just set 0 when elements have None entries #1088
Comments
As far as I can tell, we reach this point in two cases:
I think the main question here is if these operations should be allowed in the first place. In the RL context this probably only happens in case the info dict changes its structure between steps (as happens in MoveToRightEnv in the tests). One could argue, that the env is ill-defined in this case and instead of setting arbitrary default values, the env should be fixed? |
Regarding your suggestion to set values to NaN instead, assigning |
@maxhuettenrauch unfortunately, in the RL context this is bound to happen because some things might not be known at |
This issue is strongly related to #1087 |
But when would you add obs directly after reset to the buffer (where a Batch object is created) and only after that retrieve an action, call |
I thought it might happen in the collectors, but maybe I'm mistaken. For sure I've seen this happen with info objects somewhere in collector tests - though there it is a bad implementation of the env. Unfortunately Gymnasium doesn't force any interface on the info dicts which are the main drivers of this problem.We can stop supporting such cases, or ask the use to specify what should happen then explicitly. I am all for restricting the number of supported operations to decrease complexity and probability of errors :) |
Yes, definitely happening in the collector tests, due to said bad env design. I'm gonna check some examples with standard mujoco envs. |
Related: Farama-Foundation/Gymnasium#540 |
This is essentially a bug due to the current implementation of
__setitem__
:Setting

0
when something is off is wrong. This can manifest itself in things like info of reset, where certain values might be unknown (like actions orenv_num
). ThenNone
is erroneously replaced by0
. There are multiple tests covering this erroneous behavior, e.g.,line 112 in
One should instead at least turn this into
NaN
entriesThe text was updated successfully, but these errors were encountered: