pre train LSTM policy [question] #253

XMaster96 · 2019-03-29T18:18:44Z

I want to pre train a LSTM policy, with some Example data. My current approach, is to train it like a normal feed forward network (plugging in the observants in one end and compare the other wit my ground truth), and hope that your LSTM Implementation is doing the rest (hidden state managment) for me. But before I find out that it is not so easy and I spend the next two weeks of my life code digging and hidden state managing, I thought I could just simply ask you guys. Is there anything i need to keep in mind when I train the LSTM policy directly?

araffin · 2019-03-29T18:43:25Z

Hello,

Good question, I didn't try to pretrain lstm yet... and currently there is not test for that, so I cannot assure you it will work out of the box. But i'm interested by your results ;)

Is there anything i need to keep in mind when I train the LSTM policy directly?

Pre-training only means supervised learning, so if you are training on a sequence of (observation, action) you should keep track of the hidden state between each step. I don't have other advice more than that for now.

Btw, you should be aware that we are currently working on refactoring the way recurrent policies are define, see PR #244.

XMaster96 · 2019-03-30T16:00:49Z

Thanks for the answer
I have looked into a bit more and I have some more questions.

So there are two special placeholders I need to feed data in when using a LSTM policie. The initials hidden state tensor, and a mask tensor.
The states have the shape (envs_per_batch, n_hidden * 2), and masks is a boolean list with the shape (envs_per_batch * self.n_steps) and it restest the hidden state when ever is True. What I don't understand is how the one shape is translated in to the other.
~~In the Docks there are some pre train examples using a pretrained function, but none of the learners seem to have this function. so would it remove?~~
EDIT:
Stupid me, forgot to check the version I am currently in.

araffin · 2019-03-31T15:22:32Z

I don't understand is how the one shape is translated in to the other.

The LSTM code is quite misleading, I think you can have some hints by reading @erniejunior issue: #158

Side note: state_shape = [n_lstm * 2] dim because of the cell and hidden states of the LSTM

In the Docks there are some pre train examples using a pretrained function, but none of the learners seem to have this function. so would it remove?

Yes, the online doc correspond to the "master" version. Pre-training was added in v2.5.0 that was released last week.

araffin · 2019-04-08T09:34:18Z

@XMaster96 Did you make pretrain() work with LSTM policies? Or did you had to tweak the code? (Here I'm not talking about recording expert data, which is your current PR ;))

XMaster96 · 2019-04-12T05:55:33Z

@araffin sorry haven't seen your reply apparently i had the Illusion that I would get a notification.

Yes I am still working on it, the problem is just that the original solution is quite hack and not something you put in a PR. unfortunately I am also stretched a bit for time at the moment, but I shud have a early PR done in the next couple of days.

Edit:
I had some wrong notifications settings, now I should get notified

XMaster96 changed the title ~~pre train LSTM policy [question]~~ [question] pre train LSTM policy Mar 29, 2019

XMaster96 changed the title ~~[question] pre train LSTM policy~~ pre train LSTM policy [question] Mar 29, 2019

araffin added the question Further information is requested label Mar 29, 2019

XMaster96 mentioned this issue May 8, 2019

adding LSTM support to pretrain #315

Open

araffin added the enhancement New feature or request label May 8, 2019

araffin mentioned this issue Jun 27, 2019

Bug: You must feed a value for placeholder tensor 'input_1/dones_ph' with dtype float and shape [1] #391

Closed

XMaster96 closed this as completed Sep 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre train LSTM policy [question] #253

pre train LSTM policy [question] #253

XMaster96 commented Mar 29, 2019

araffin commented Mar 29, 2019

XMaster96 commented Mar 30, 2019 •

edited

Loading

araffin commented Mar 31, 2019

araffin commented Apr 8, 2019

XMaster96 commented Apr 12, 2019 •

edited

Loading

pre train LSTM policy [question] #253

pre train LSTM policy [question] #253

Comments

XMaster96 commented Mar 29, 2019

araffin commented Mar 29, 2019

XMaster96 commented Mar 30, 2019 • edited Loading

araffin commented Mar 31, 2019

araffin commented Apr 8, 2019

XMaster96 commented Apr 12, 2019 • edited Loading

XMaster96 commented Mar 30, 2019 •

edited

Loading

XMaster96 commented Apr 12, 2019 •

edited

Loading