-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre train LSTM policy [question] #253
Comments
Hello, Good question, I didn't try to pretrain lstm yet... and currently there is not test for that, so I cannot assure you it will work out of the box. But i'm interested by your results ;)
Pre-training only means supervised learning, so if you are training on a sequence of (observation, action) you should keep track of the hidden state between each step. I don't have other advice more than that for now. Btw, you should be aware that we are currently working on refactoring the way recurrent policies are define, see PR #244. |
Thanks for the answer
|
The LSTM code is quite misleading, I think you can have some hints by reading @erniejunior issue: #158 Side note: state_shape = [n_lstm * 2] dim because of the cell and hidden states of the LSTM
Yes, the online doc correspond to the "master" version. Pre-training was added in v2.5.0 that was released last week. |
@XMaster96 Did you make |
@araffin sorry haven't seen your reply apparently i had the Illusion that I would get a notification. Yes I am still working on it, the problem is just that the original solution is quite hack and not something you put in a PR. unfortunately I am also stretched a bit for time at the moment, but I shud have a early PR done in the next couple of days. Edit: |
I want to pre train a LSTM policy, with some Example data. My current approach, is to train it like a normal feed forward network (plugging in the observants in one end and compare the other wit my ground truth), and hope that your LSTM Implementation is doing the rest (hidden state managment) for me. But before I find out that it is not so easy and I spend the next two weeks of my life code digging and hidden state managing, I thought I could just simply ask you guys. Is there anything i need to keep in mind when I train the LSTM policy directly?
The text was updated successfully, but these errors were encountered: