Support Dict observation spaces #1065

MischaPanch · 2024-02-26T12:26:05Z

Maybe also action spaces.

I'm not sure what the status of the current support is, and I can't estimate the complexity.

It's probably not a priority, but if an external contributor wants to look into it, we could review this. The solution should come with proper documentation

Looking at how other projects support complex action/observation spaces might be a good start.

Related issues: #1064

MovePhilip · 2024-02-27T16:17:21Z

I have a environment which has variable observation space, the text sequence. But the batch mechanism seem can't not compatible with the variable returned observation. how can i deal with this?

MischaPanch · 2024-02-27T16:36:24Z

I suggest to either wrap your environment such that it doesn't have a dict space as observations (this should always be possible), or to work on a PR for solving this issue (might not be easy though).

Not sure what you mean by variable observation. If it's a text sequence as array, Batch can handle it, but you will need a custom Agent for processing text

MovePhilip · 2024-02-29T07:13:49Z

My environment is a web browser. The returned observation is a text sequence showed in the web page. So the returned text sequence length is always changed at every step. The batch mechanism will put some errors when transfer this kind of observations. I don't think the padding and truncating kind work is necessary to post process the observations.

MovePhilip · 2024-02-29T07:19:41Z

It always put errors in the setitem function of batch class,
ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (2,124)

As you can see, the obs_next has different kind of shape with obs, so it put an error

MischaPanch · 2024-02-29T09:46:00Z

I see. It's not really related to this issue, which is about Dict interfaces.

Your environment violates the gym/Gymnasium API, where an env is assumed to have a fixed numerical observation space. In any case, for your model training you do need to process the sequences info arrays of the same length, right?

I suggest you wrap your environment with a Wrapper that does turn it into a gym-like env. Supporting non-gym envs is outside of the scope of tianshou for now, though we might come back to it in a distant future for better supporti of rlhf

MovePhilip · 2024-02-29T12:22:36Z

Thanks for your advice, I checked the tianshou.data module. Figuring out how to replace the Batch class seems need a lot of time. I will try to do the pad and mask to the input to solve it.

MischaPanch · 2024-02-29T13:49:11Z

Yes, Batch is pretty fundamental in tianshou and used everywhere ^^

MischaPanch added enhancement Feature that is not a new algorithm or an algorithm enhancement good first issue Good for newcomers documentation labels Feb 26, 2024

MischaPanch mentioned this issue Feb 26, 2024

how to convert Batch into ndarray/tensor #1064

Closed

MischaPanch added the tentative Up to discussion, may be dismissed label Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Dict observation spaces #1065

Support Dict observation spaces #1065

MischaPanch commented Feb 26, 2024

MovePhilip commented Feb 27, 2024

MischaPanch commented Feb 27, 2024

MovePhilip commented Feb 29, 2024

MovePhilip commented Feb 29, 2024

MischaPanch commented Feb 29, 2024 •

edited

Loading

MovePhilip commented Feb 29, 2024

MischaPanch commented Feb 29, 2024

Support Dict observation spaces #1065

Support Dict observation spaces #1065

Comments

MischaPanch commented Feb 26, 2024

MovePhilip commented Feb 27, 2024

MischaPanch commented Feb 27, 2024

MovePhilip commented Feb 29, 2024

MovePhilip commented Feb 29, 2024

MischaPanch commented Feb 29, 2024 • edited Loading

MovePhilip commented Feb 29, 2024

MischaPanch commented Feb 29, 2024

MischaPanch commented Feb 29, 2024 •

edited

Loading