You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I was trying to work this out on my end from scratch, I have got it to the point of training the model and also visualize but it seems to drop in the middle of the training session without saving the model.
VC:
Python : 3.8.10
tensorflow = 2.3.1
Windows = 11
No IDLE, Using script mode from windows power shell virtual env.
Below is the complete Traceback of the error I received.
2022-03-07 04:17:43.095316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-07 04:17:43.100610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 501, in
train_multiprocessing(CustomEnv, agent, train_df, train_df_nomalized, num_worker = 5, training_batch_size=50, visualize=True, EPISODES=5)
File "D:\Mine\RLCurrent\multiprocessing_env.py", line 95, in train_multiprocessing
a_loss, c_loss = agent.replay(states[worker_id], actions[worker_id], rewards[worker_id], predictions[worker_id], dones[worker_id], next_states[worker_id])
File "RL-Bitcoin-trading-bot_7.py", line 121, in replay
advantages, target = self.get_gaes(rewards, dones, np.squeeze(values), np.squeeze(next_values))
File "RL-Bitcoin-trading-bot_7.py", line 93, in get_gaes
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
File "RL-Bitcoin-trading-bot_7.py", line 93, in
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
Any sort of help is highly appreciated. If needed I'll post code snippets as well for more clarity.
Thanks.
The text was updated successfully, but these errors were encountered:
Hey I think the problem is originated from the output of critic_predict. I guess that in the original PPO function implemented by the writer has included "Critic model also watched the previous predicted value", but he removed it in this tutorial. That means critic model doesn't check previous value input now. Maybe you should try removing the np.zero input in critirc_predict function.
Hello, I was trying to work this out on my end from scratch, I have got it to the point of training the model and also visualize but it seems to drop in the middle of the training session without saving the model.
VC:
Python : 3.8.10
tensorflow = 2.3.1
Windows = 11
No IDLE, Using script mode from windows power shell virtual env.
Below is the complete Traceback of the error I received.
2022-03-07 04:17:43.095316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-07 04:17:43.100610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 501, in
train_multiprocessing(CustomEnv, agent, train_df, train_df_nomalized, num_worker = 5, training_batch_size=50, visualize=True, EPISODES=5)
File "D:\Mine\RLCurrent\multiprocessing_env.py", line 95, in train_multiprocessing
a_loss, c_loss = agent.replay(states[worker_id], actions[worker_id], rewards[worker_id], predictions[worker_id], dones[worker_id], next_states[worker_id])
File "RL-Bitcoin-trading-bot_7.py", line 121, in replay
advantages, target = self.get_gaes(rewards, dones, np.squeeze(values), np.squeeze(next_values))
File "RL-Bitcoin-trading-bot_7.py", line 93, in get_gaes
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
File "RL-Bitcoin-trading-bot_7.py", line 93, in
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
Any sort of help is highly appreciated. If needed I'll post code snippets as well for more clarity.
Thanks.
The text was updated successfully, but these errors were encountered: