Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some detail regarding dataloader #2

Closed
kevinghst opened this issue Jul 26, 2021 · 3 comments
Closed

some detail regarding dataloader #2

kevinghst opened this issue Jul 26, 2021 · 3 comments

Comments

@kevinghst
Copy link

Hi,

Thanks for sharing the code. I have some question about the way you use frame and action to make predictions.

The input to the collate function has the following dimensions

observation.shape = [256, 20, 84, 84]
action.shape. = [256, 20]

I assume that for a given index, the observation corresponds to the current observation, and the action corresponds to the current action.
For example, observation[0][0] and action[0][0] correspond to the (observation, action) for batch 0, index 0

If I understand correctly, the collate function stacks the frames and return the following (simplified, where batch is ignored) format for the indices:

observation: [[frame1, frame2, frame3, frame4], [frame2, frame3, frame4, frame5], ....]
action: [action for frame4, action for frame5 ....]
reward: [reward for frame4, reward for frame5 ....]

However, in your code for next latent prediction, you seem to be using the current frame and the next action to predict the next frame.

For example, I think you are using [frame1, frame2, frame3, frame4] and action for frame 5 to compute the latent for [frame2, frame3, frame4, frame5]

Shouldn't you be using the current action as opposed to the next action in conjunction with the current frame? Or did I misunderstand something?

Thanks for the clarification,
Kevin

@MaxASchwarzer
Copy link
Collaborator

Hi Kevin,

This comes down to the convention used by rlpyt, that actions[t] refers to the action taken to arrive at states[t]. We modified this dataloader to follow that convention as well, by offsetting actions by one in the collate function (we start our slice there at frames-1, meaning that there's one extra action at the start of the actions tensor as padding)

Best,
Max

@kevinghst
Copy link
Author

kevinghst commented Jul 26, 2021

Thanks for the explanation Max.

However, I am not sure if I can make sense of your idea from the code in the collate function:

observation = observation[:-frames].unsqueeze(3)
this gets rid of the last 4 frames from observation tensor.
observations should now include [[frame0, frame1, frame2, frame3], [frame1, frame2, frame3, frame4], ...]

action = torch.einsum('bt->tb', action)[frames-1:-1].long()
if action was previously [a0, a1, a2, a3....], this line gets rid of the first 3 actions and the last action,
so action becomes [a3, a4, a5, ...]

Thus the first frame stack is [frame0, frame1, frame2, frame3], and the first action is a3

Is my logic above correct? If so, then shouldn't a3 be the action taken at the stack [frame0, frame1, frame2, frame3], instead of it being the action taken to arrive at that stack? (which should be a2).

@MaxASchwarzer
Copy link
Collaborator

Hi Kevin,

I talked with the teammates who worked on this section of the code, and I think you're right -- as written it is off-by-one. I'll look into how/if this affected results in the original code we used for our experiments. In the meantime, I'll swap the release branch to using [frames-2:-2] there.

Thanks for looking into this so carefully!

Best,
Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants