ACER - Examples on continuous action space #143

PKramek · 2021-05-17T12:11:09Z

Hello,

I am working on an RL project, where I want to use the ACER algorithm on continuous action space problems (Pybullet environments), but I have difficulties implementing it using Your framework. Would it be possible for You to add an example of how to use this algorithm on this class of problems?

PKramek · 2021-05-18T16:35:04Z

Specifically, I would like to know how to use models that do not share parameters across policy and value function

muupan · 2021-05-20T04:01:59Z

You are right, we don't have an example script for ACER with a continuous action space for now. It is nice to add one.

For now we only have a unit test for ACER with a continuous action space. Here is how the model is defined:

pfrl/tests/agents_tests/test_acer.py

Lines 388 to 400 in 44bf2e4

    
           head = acer.ACERContinuousActionHead( 
        
               pi=nn.Sequential( 
        
                   nn.Linear(hidden_size, action_size * 2), 
        
                   GaussianHeadWithDiagonalCovariance(), 
        
               ), 
        
               v=nn.Sequential( 
        
                   nn.Linear(hidden_size, 1), 
        
               ), 
        
               adv=nn.Sequential( 
        
                   ConcatObsAndAction(), 
        
                   nn.Linear(hidden_size + action_size, 1), 
        
               ), 
        
           )

pfrl/tests/agents_tests/test_acer.py

Lines 409 to 413 in 44bf2e4

    
           model = nn.Sequential( 
        
               nn.Linear(obs_size, hidden_size), 
        
               nn.LeakyReLU(), 
        
               head, 
        
           )

So model's first Linear layer is shared. To implement a model with no parameters shared, I guess you can directly use ACERContinuousActionHead as a model passed to ACER, where pi, v, and adv do not share parameters.

PKramek · 2021-05-25T11:06:43Z

I get an error message when I try to use GaussianHeadWithFixedCovariance (as described in the original paper). Is that intentional?

...pfrl/agents/acer.py", line 217, in compute_loss_with_kl_constraint assert param.requires_grad

Here is my model definition for error reproduction

    dummy_env = gym.make(args.env)
    obs_size = dummy_env.observation_space.shape[0]
    action_size = dummy_env.action_space.shape[0]
    hidden_size = 256

    head = acer.ACERContinuousActionHead(
        pi=nn.Sequential(
            nn.Linear(hidden_size, action_size),
            pfrl.policies.GaussianHeadWithFixedCovariance(scale=0.3),
        ),
        v=nn.Sequential(
            nn.Linear(hidden_size, 1),
        ),
        adv=nn.Sequential(
            ConcatObsAndAction(),
            nn.Linear(hidden_size + action_size, 1),
        ),
    )

    model = nn.Sequential(
        nn.Linear(obs_size, hidden_size),
        nn.ReLU(),
        nn.Linear(hidden_size, hidden_size),
        nn.ReLU(),
        nn.Linear(hidden_size, hidden_size),
        nn.ReLU(),
        head,
    )

    model.apply(init_chainer_default)

    opt = pfrl.optimizers.SharedRMSpropEpsInsideSqrt(
        model.parameters(), lr=10e-5, eps=4e-3, alpha=0.99
    )

muupan · 2021-05-26T01:59:06Z

It seems like a bug in ACER. Can you try commenting out assert param.requires_grad and see if it works?

PKramek · 2021-05-26T06:55:42Z

I tried that and this causes the program to throw another exception:

  File ".../venv/lib/python3.7/site-packages/pfrl/agents/acer.py", line 448, in compute_one_step_pi_loss
    delta=self.trust_region_delta,
  File ".../venv/lib/python3.7/site-packages/pfrl/agents/acer.py", line 222, in compute_loss_with_kl_constraint
    [original_loss], distrib_params, retain_graph=True
  File ".../venv/lib/python3.7/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
RuntimeError: One of the differentiated Tensors does not require grad

PKramek · 2021-05-26T07:08:28Z

Maybe Acer is trying to update its parameters before any data was propagated through its neural networks and it causes that exception.

muupan · 2021-05-26T07:12:28Z

Thank you for confirming that. It seems like ACER does not work with GaussianHeadWithFixedCovariance for now. I think this needs to be fixed.

A possible workaround for it would be GaussianHeadWithStateIndependentCovariance with var_param's learning rate set to 0.

PKramek · 2021-05-26T07:18:36Z

Thank you for a workaround idea, would it be possible for You to create a fix in the nearest future?

PKramek · 2021-05-26T07:29:54Z

And I dont see learning rate used anywhere in GaussianHeadWithStateIndependentCovariance but from what I understand I could just pass a callable as var_func, which always returns desired variance value

muupan · 2021-05-26T08:08:12Z

I meant the learning rate you set when you make your optimizer, since PyTorch's optimizer allows setting parameter-specific learning rates. I guess you are right about var_func. Any var_func that returns a tensor whose requires_grad is True would be sufficient as well.

I will try reproducing the issue myself and hopefully fix soon.

PKramek · 2021-05-26T08:18:44Z

Could you point me out to some examples of how could I use separate PyTorch optimizers or rather just learning rates for my NNs in pfrl. In all of the provided examples, where there are multiple optimizers, they are passed to the training function as separate parameters. What If I wanted to set a different learning rate for my actor, and a different learning rate for my critic and at the same time set learning rate for just GaussianHeadWithStateIndependentCovariance layer in actor. Currently, I am using pfrl.optimizers.SharedRMSpropEpsInsideSqrt but it does not offer learning rate parametrization beyond setting just single value for all NNs

muupan · 2021-05-26T08:27:04Z

Here is PyTorch's official doc: https://pytorch.org/docs/stable/optim.html#per-parameter-options pfrl.optimizers.SharedRMSpropEpsInsideSqrt is a subclass of torch.optim.Optimizer so it is no different.

muupan · 2021-05-27T01:57:01Z

I hope #145 will resolve the issue.

PKramek · 2021-05-27T05:05:06Z

I will test it, I also would like to ask if Acer was tested on continuous action spaces. I've tried training a few agents but did not see any progress in achieved rewards, but it is very likely I've made some mistakes in hyperparameterization or learning process setup

muupan · 2021-05-28T06:04:13Z

It is tested on a toy env with continuous actions, but it is not verified that it can reproduce the performance on the continuous tasks in the paper.

Here is a sample script to train ACER on OpenAI Gym MuJoCo envs. It seems to work to some extent. It is not tuned much and I cannot guarantee that the hyperparameters etc. are same as those used in the paper.

https://github.com/muupan/pfrl/blob/acer-continous-example-tune/examples/mujoco/train_acer_mujoco.py

command (with #145 applied):

python3 examples/mujoco/train_acer_mujoco.py --num-envs 8 --steps 3000000 --env HalfCheetah-v2

training log:
scores.txt

video with the final model:

openaigym.video.1.62730.video000000.mp4

PKramek · 2021-05-28T07:43:13Z

That really helped me, thank You very much for all the time You spend debugging code and helping me

muupan self-assigned this May 26, 2021

muupan mentioned this issue May 26, 2021

ACER raises an error when GaussianHeadWithFixedCovariance is used #144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACER - Examples on continuous action space #143

ACER - Examples on continuous action space #143

PKramek commented May 17, 2021

PKramek commented May 18, 2021

muupan commented May 20, 2021 •

edited

Loading

PKramek commented May 25, 2021

muupan commented May 26, 2021 •

edited

Loading

PKramek commented May 26, 2021

PKramek commented May 26, 2021

muupan commented May 26, 2021

PKramek commented May 26, 2021

PKramek commented May 26, 2021 •

edited

Loading

muupan commented May 26, 2021

PKramek commented May 26, 2021

muupan commented May 26, 2021

muupan commented May 27, 2021

PKramek commented May 27, 2021 via email •

edited

Loading

muupan commented May 28, 2021 •

edited

Loading

PKramek commented May 28, 2021

ACER - Examples on continuous action space #143

ACER - Examples on continuous action space #143

Comments

PKramek commented May 17, 2021

PKramek commented May 18, 2021

muupan commented May 20, 2021 • edited Loading

PKramek commented May 25, 2021

muupan commented May 26, 2021 • edited Loading

PKramek commented May 26, 2021

PKramek commented May 26, 2021

muupan commented May 26, 2021

PKramek commented May 26, 2021

PKramek commented May 26, 2021 • edited Loading

muupan commented May 26, 2021

PKramek commented May 26, 2021

muupan commented May 26, 2021

muupan commented May 27, 2021

PKramek commented May 27, 2021 via email • edited Loading

muupan commented May 28, 2021 • edited Loading

PKramek commented May 28, 2021

muupan commented May 20, 2021 •

edited

Loading

muupan commented May 26, 2021 •

edited

Loading

PKramek commented May 26, 2021 •

edited

Loading

PKramek commented May 27, 2021 via email •

edited

Loading

muupan commented May 28, 2021 •

edited

Loading