Skip to content

Latest commit

 

History

History

DQN

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DQN on gym Taxi environment

Comparison of ReLU and FTA

10000 episodes and maximum cutoff of 100 using FTA
10000 episodes and maximum cutoff of 100 using ReLU

Final Results using ReLU

Using ReLU activation function with 50000 episodes
My first experiment was a DQN similar to the one proposed in Pytorch, but it didn't learn anything.
First experiment, DQN with soft update of target policy
In my next experiments, Instead of using the soft update of the target network's weights, I used the method of replacing the target weights with the policy network weights. It got better but still was very noisy. I didn't test this version with a larger number of episodes. Instead, I reduced the maximum timesteps for each episode. The default maximum length of an episode is 200 for the Taxi environment, but I reduced it to 100, and the learning process improved.
Second experiment, changing the soft update