You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Inconsistent Hidden State Handling in act() Method of PPO Implementation
Issue Summary
The act() method in the PPO implementation does not pass hidden states to theself.actor_critic.act(obs) and self.actor_critic.evaluate(critic_obs), leading to inconsistent action/value estimates between rollout (inference) and training. This issue is especially problematic for recurrent policies (e.g., LSTM/GRU), where past information should influence both action selection and value estimation.
Suggested Fix
Modify act() to include hidden states when FFing the actor and critic:
Issue: Inconsistent Hidden State Handling in
act()
Method of PPO ImplementationIssue Summary
The
act()
method in the PPO implementation does not pass hidden states to theself.actor_critic.act(obs)
andself.actor_critic.evaluate(critic_obs)
, leading to inconsistent action/value estimates between rollout (inference) and training. This issue is especially problematic for recurrent policies (e.g., LSTM/GRU), where past information should influence both action selection and value estimation.Suggested Fix
Modify act() to include hidden states when FFing the actor and critic:
The text was updated successfully, but these errors were encountered: