Skip to content

Tensorflow implementation of Deep Reinforcement learning algorithms: Parameter Space Noise for Exploration

Notifications You must be signed in to change notification settings

aspk/Tensorflow_Deep_RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Tensorflow_Deep_RL

An implementation of multiple algorithms used for training agents on gym environment. Algorithms are directly implemented from research papers and references are cited.

Parameter Space Noise for Exploration

This work implements the idea from Parameter Space Noise for Exploration from OpenAi and in their work this approach trained faster than action space exploration for certain environments especially environments with sparse rewards. Standard deviation of noise is constant in my work but noise can be annealed as training progresses for an optimum solution. Agents have to be trained multiple times as there is a high possibility trainng getting stuck in local optima.

https://arxiv.org/abs/1706.01905

Filename: param_noise_mlp.py

Dependencies

Python 3.6 with Anaconda, Tensorflow, OpenAI Gym required.

Usage

python param_noise_mlp.py

Graph

graph = tf.Graph()
with graph.as_default():
    x  = tf.placeholder(tf.float32,shape = [None,obs_space])
    adv = tf.placeholder(tf.float32,shape = [None])
    rew = tf.placeholder(tf.float32,shape = [None,1])
    ac = tf.placeholder(tf.float32,shape = [None,ac_space])
    keep_prob = tf.placeholder(tf.float32)

    x1 = linear(x,in_dim = obs_space,out_dim = ac_space,scope_name = 'l1')
    #x1 = tf.nn.dropout(x1, keep_prob)
    #policy network
    #x2 = linear(x1,in_dim = 4,out_dim = ac_space,scope_name = 'l2')
    y  = tf.nn.softmax(x1)
    #value network
    #x3 = tf.nn.dropout(x1, keep_prob)
    y_v = linear(x,in_dim = obs_space,out_dim = 1,scope_name = 'l3')

    #value loss
    loss_v = tf.nn.l2_loss(y_v-rew)
    # policy loss
    log_prob_tf = tf.log(y)
    loss1 = tf.reduce_sum(log_prob_tf * ac, [1])
    loss = - tf.reduce_sum(tf.reduce_sum(log_prob_tf * ac, [1])*adv) + loss_v
    optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
    init = tf.global_variables_initializer()

Noisy parameters

Noise is added to the parameters while collecting rollouts for exploration and deterministic actions are used instead of probabilistic in each episode.

def noisy_vars(noise_std = 1):
    sess = tf.get_default_session()
    var_names = tf.global_variables()
    old_var = sess.run(var_names)
    var_shapes = [i.shape for i in old_var]
    new_var = [i+np.random.normal(0,noise_std,size = j) for i,j in zip(old_var,var_shapes)]
    # setting new values
    for i,j in zip(var_names,new_var):
        sess.run(i.assign(j))
    return

def reset_vars(old_var):
    sess = tf.get_default_session()
    var_shapes = [i.shape for i in old_var]
    # setting old values
    for i,j in zip(var_names,old_var):
        sess.run(i.assign(j))
    return

Train

def train(run_stat,numsteps,batch_size,dropout = 1):
    sess = tf.get_default_session()
    value = sess.run(y_v,feed_dict={x:run_stat['obs'][:numsteps], keep_prob:1.0})
    advantage = run_stat['reward'][:numsteps].reshape(numsteps,1)-value
    advantage = advantage.reshape(len(advantage))
    #print('training value')
    for i in range(numsteps//batch_size):
        batchobs = run_stat['obs'][i*batch_size:(i+1)*batch_size]
        batchrew = run_stat['reward'][i*batch_size:(i+1)*batch_size]
        batchrew = batchrew.reshape(len(batchrew),1)
        batchadv = advantage[i*batch_size:(i+1)*batch_size]
        batchac = run_stat['action'][i*batch_size:(i+1)*batch_size]
        #print("optimizing")
        sess.run(optimizer,feed_dict = {x: batchobs, rew: batchrew, adv: batchadv, ac:batchac, keep_prob:dropout})                     #keep_prob<1
    return

About

Tensorflow implementation of Deep Reinforcement learning algorithms: Parameter Space Noise for Exploration

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages