-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase modularization of training wrappers #422
Comments
Haven't tested this, but I'm thinking something like this:
|
This allows us to pass back information from the wrapped things up to the clients that are expecting them (episode wrappers truncated for instance) while at the same time caching this stuff so that we can reuse it during the reset. |
Tested the above (not very rigorously), seems to work |
Hi @vyeevani Thanks for the proposal, indeed there is logic leaking into reset from the episode wrapper. If you changed My only concern beyond cleaner semantics is how this affects performance |
The training wrappers for auto reset and episode wrapper are leaking info to each other.
This is a bigger problem if people want to stack their own wrappers. For example, I'd like to write a meta-episode wrapper that takes multiple episodes and aggregates them into a single meta episode for use in a meta-RL setting. The wrapper that I'm writing needs to track the number of episodes so that it can do a meta episode reset when it reaches some watermark. However, the auto reset would break this since it wouldn't reset the meta wrapper under it.
At a high level, I'd like to separate the state of the auto reset wrapper from the environments that it's wrapping. I propose to do this by caching the initial state of the environment and the current state of the environment in the info, and only working on that.
Note, through this process, you never need to return the pipeline state through the state itself.
The text was updated successfully, but these errors were encountered: