-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some detail regarding dataloader #2
Comments
Hi Kevin, This comes down to the convention used by rlpyt, that actions[t] refers to the action taken to arrive at states[t]. We modified this dataloader to follow that convention as well, by offsetting actions by one in the collate function (we start our slice there at frames-1, meaning that there's one extra action at the start of the actions tensor as padding) Best, |
Thanks for the explanation Max. However, I am not sure if I can make sense of your idea from the code in the collate function:
Thus the first frame stack is [frame0, frame1, frame2, frame3], and the first action is a3 Is my logic above correct? If so, then shouldn't a3 be the action taken at the stack [frame0, frame1, frame2, frame3], instead of it being the action taken to arrive at that stack? (which should be a2). |
Hi Kevin, I talked with the teammates who worked on this section of the code, and I think you're right -- as written it is off-by-one. I'll look into how/if this affected results in the original code we used for our experiments. In the meantime, I'll swap the release branch to using [frames-2:-2] there. Thanks for looking into this so carefully! Best, |
Hi,
Thanks for sharing the code. I have some question about the way you use frame and action to make predictions.
The input to the collate function has the following dimensions
observation.shape = [256, 20, 84, 84]
action.shape. = [256, 20]
I assume that for a given index, the observation corresponds to the current observation, and the action corresponds to the current action.
For example, observation[0][0] and action[0][0] correspond to the (observation, action) for batch 0, index 0
If I understand correctly, the collate function stacks the frames and return the following (simplified, where batch is ignored) format for the indices:
observation: [[frame1, frame2, frame3, frame4], [frame2, frame3, frame4, frame5], ....]
action: [action for frame4, action for frame5 ....]
reward: [reward for frame4, reward for frame5 ....]
However, in your code for next latent prediction, you seem to be using the current frame and the next action to predict the next frame.
For example, I think you are using [frame1, frame2, frame3, frame4] and action for frame 5 to compute the latent for [frame2, frame3, frame4, frame5]
Shouldn't you be using the current action as opposed to the next action in conjunction with the current frame? Or did I misunderstand something?
Thanks for the clarification,
Kevin
The text was updated successfully, but these errors were encountered: