Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Dict observation spaces #1065

Open
MischaPanch opened this issue Feb 26, 2024 · 7 comments
Open

Support Dict observation spaces #1065

MischaPanch opened this issue Feb 26, 2024 · 7 comments
Labels
documentation enhancement Feature that is not a new algorithm or an algorithm enhancement good first issue Good for newcomers tentative Up to discussion, may be dismissed

Comments

@MischaPanch
Copy link
Collaborator

Maybe also action spaces.

I'm not sure what the status of the current support is, and I can't estimate the complexity.

It's probably not a priority, but if an external contributor wants to look into it, we could review this. The solution should come with proper documentation

Looking at how other projects support complex action/observation spaces might be a good start.

Related issues: #1064

@MischaPanch MischaPanch added enhancement Feature that is not a new algorithm or an algorithm enhancement good first issue Good for newcomers documentation labels Feb 26, 2024
@MovePhilip
Copy link

I have a environment which has variable observation space, the text sequence. But the batch mechanism seem can't not compatible with the variable returned observation. how can i deal with this?

@MischaPanch
Copy link
Collaborator Author

I suggest to either wrap your environment such that it doesn't have a dict space as observations (this should always be possible), or to work on a PR for solving this issue (might not be easy though).

Not sure what you mean by variable observation. If it's a text sequence as array, Batch can handle it, but you will need a custom Agent for processing text

@MovePhilip
Copy link

My environment is a web browser. The returned observation is a text sequence showed in the web page. So the returned text sequence length is always changed at every step. The batch mechanism will put some errors when transfer this kind of observations. I don't think the padding and truncating kind work is necessary to post process the observations.

@MovePhilip
Copy link

It always put errors in the setitem function of batch class,
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (2,124)
image
image
As you can see, the obs_next has different kind of shape with obs, so it put an error

@MischaPanch
Copy link
Collaborator Author

MischaPanch commented Feb 29, 2024

I see. It's not really related to this issue, which is about Dict interfaces.

Your environment violates the gym/Gymnasium API, where an env is assumed to have a fixed numerical observation space. In any case, for your model training you do need to process the sequences info arrays of the same length, right?

I suggest you wrap your environment with a Wrapper that does turn it into a gym-like env. Supporting non-gym envs is outside of the scope of tianshou for now, though we might come back to it in a distant future for better supporti of rlhf

@MovePhilip
Copy link

Thanks for your advice, I checked the tianshou.data module. Figuring out how to replace the Batch class seems need a lot of time. I will try to do the pad and mask to the input to solve it.

@MischaPanch
Copy link
Collaborator Author

Yes, Batch is pretty fundamental in tianshou and used everywhere ^^

@MischaPanch MischaPanch added the tentative Up to discussion, may be dismissed label Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement Feature that is not a new algorithm or an algorithm enhancement good first issue Good for newcomers tentative Up to discussion, may be dismissed
Projects
None yet
Development

No branches or pull requests

2 participants