some detail regarding dataloader #2

kevinghst · 2021-07-26T17:07:24Z

Hi,

Thanks for sharing the code. I have some question about the way you use frame and action to make predictions.

The input to the collate function has the following dimensions

observation.shape = [256, 20, 84, 84]
action.shape. = [256, 20]

I assume that for a given index, the observation corresponds to the current observation, and the action corresponds to the current action.
For example, observation[0][0] and action[0][0] correspond to the (observation, action) for batch 0, index 0

If I understand correctly, the collate function stacks the frames and return the following (simplified, where batch is ignored) format for the indices:

observation: [[frame1, frame2, frame3, frame4], [frame2, frame3, frame4, frame5], ....]
action: [action for frame4, action for frame5 ....]
reward: [reward for frame4, reward for frame5 ....]

However, in your code for next latent prediction, you seem to be using the current frame and the next action to predict the next frame.

For example, I think you are using [frame1, frame2, frame3, frame4] and action for frame 5 to compute the latent for [frame2, frame3, frame4, frame5]

Shouldn't you be using the current action as opposed to the next action in conjunction with the current frame? Or did I misunderstand something?

Thanks for the clarification,
Kevin

MaxASchwarzer · 2021-07-26T20:41:37Z

Hi Kevin,

This comes down to the convention used by rlpyt, that actions[t] refers to the action taken to arrive at states[t]. We modified this dataloader to follow that convention as well, by offsetting actions by one in the collate function (we start our slice there at frames-1, meaning that there's one extra action at the start of the actions tensor as padding)

Best,
Max

kevinghst · 2021-07-26T22:18:01Z

Thanks for the explanation Max.

However, I am not sure if I can make sense of your idea from the code in the collate function:

observation = observation[:-frames].unsqueeze(3)
this gets rid of the last 4 frames from observation tensor.
observations should now include [[frame0, frame1, frame2, frame3], [frame1, frame2, frame3, frame4], ...]

action = torch.einsum('bt->tb', action)[frames-1:-1].long()
if action was previously [a0, a1, a2, a3....], this line gets rid of the first 3 actions and the last action,
so action becomes [a3, a4, a5, ...]

Thus the first frame stack is [frame0, frame1, frame2, frame3], and the first action is a3

Is my logic above correct? If so, then shouldn't a3 be the action taken at the stack [frame0, frame1, frame2, frame3], instead of it being the action taken to arrive at that stack? (which should be a2).

MaxASchwarzer · 2021-07-27T02:58:16Z

Hi Kevin,

I talked with the teammates who worked on this section of the code, and I think you're right -- as written it is off-by-one. I'll look into how/if this affected results in the original code we used for our experiments. In the meantime, I'll swap the release branch to using [frames-2:-2] there.

Thanks for looking into this so carefully!

Best,
Max

MaxASchwarzer closed this as completed Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some detail regarding dataloader #2

some detail regarding dataloader #2

kevinghst commented Jul 26, 2021

MaxASchwarzer commented Jul 26, 2021

kevinghst commented Jul 26, 2021 •

edited

Loading

MaxASchwarzer commented Jul 27, 2021

some detail regarding dataloader #2

some detail regarding dataloader #2

Comments

kevinghst commented Jul 26, 2021

MaxASchwarzer commented Jul 26, 2021

kevinghst commented Jul 26, 2021 • edited Loading

MaxASchwarzer commented Jul 27, 2021

kevinghst commented Jul 26, 2021 •

edited

Loading