You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your great work. It is really cool to open source such an amazing code base!
TL;DR
@yogesh1q2w, and I noticed that the last transitions of a trajectory are not properly handled. Indeed, multiple ReplayElements with a terminal flag are stored when only one is given to the accumulator.
It is problematic because the additional terminal states do not correspond to states that can be observed from the environment. This is problematic because we use function approximation.
# Check if we have a valid transition, i.e. we either
# 1) have accumulated more transitions than the update horizon
# 2) have a trajectory shorter than the update horizon, but the
# last element is terminal
ifnot (
trajectory_len>self._update_horizon
or (trajectory_len>1andlast_transition.is_terminal)
):
returnNone
by
# Check if we have a valid transition, i.e. we either# 1) have accumulated more transitions than the update horizon and the# last element is not terminal# 2) have a trajectory shorter than the update horizon, but the# last element is terminal and we have enough frames to stackifnot (
(trajectory_len>self._update_horizonandnotlast_transition.is_terminal)
or (trajectory_len>self._stack_sizeandlast_transition.is_terminal)
):
returnNone
solves the issue. Indeed, by running the same code again, we obtain:
The text was updated successfully, but these errors were encountered:
theovincent
changed the title
Bug in the Reply Buffer: end of episodes are not correctly handled
Bug in the Reply Buffer: end of episodes is not correctly handled
Feb 18, 2025
Hi,
Thank you for your great work. It is really cool to open source such an amazing code base!
TL;DR
@yogesh1q2w, and I noticed that the last transitions of a trajectory are not properly handled. Indeed, multiple ReplayElements with a terminal flag are stored when only one is given to the accumulator.
It is problematic because the additional terminal states do not correspond to states that can be observed from the environment. This is problematic because we use function approximation.
How to reproduce?
After forking the repo and running
I ran
The last 3 ReplayElements are incorrect. They should not have been added.
How to fix the bug?
Replacing the following lines
dopamine/dopamine/jax/replay_memory/accumulator.py
Lines 74 to 82 in bec5f4e
by
solves the issue. Indeed, by running the same code again, we obtain:
The last ReplayElements have been filtered 🎉
The text was updated successfully, but these errors were encountered: