Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACER - Examples on continuous action space #143

Open
PKramek opened this issue May 17, 2021 · 16 comments
Open

ACER - Examples on continuous action space #143

PKramek opened this issue May 17, 2021 · 16 comments
Assignees

Comments

@PKramek
Copy link

PKramek commented May 17, 2021

Hello,

I am working on an RL project, where I want to use the ACER algorithm on continuous action space problems (Pybullet environments), but I have difficulties implementing it using Your framework. Would it be possible for You to add an example of how to use this algorithm on this class of problems?

@PKramek
Copy link
Author

PKramek commented May 18, 2021

Specifically, I would like to know how to use models that do not share parameters across policy and value function

@muupan
Copy link
Member

muupan commented May 20, 2021

You are right, we don't have an example script for ACER with a continuous action space for now. It is nice to add one.

For now we only have a unit test for ACER with a continuous action space. Here is how the model is defined:

head = acer.ACERContinuousActionHead(
pi=nn.Sequential(
nn.Linear(hidden_size, action_size * 2),
GaussianHeadWithDiagonalCovariance(),
),
v=nn.Sequential(
nn.Linear(hidden_size, 1),
),
adv=nn.Sequential(
ConcatObsAndAction(),
nn.Linear(hidden_size + action_size, 1),
),
)

model = nn.Sequential(
nn.Linear(obs_size, hidden_size),
nn.LeakyReLU(),
head,
)

So model's first Linear layer is shared. To implement a model with no parameters shared, I guess you can directly use ACERContinuousActionHead as a model passed to ACER, where pi, v, and adv do not share parameters.

@PKramek
Copy link
Author

PKramek commented May 25, 2021

I get an error message when I try to use GaussianHeadWithFixedCovariance (as described in the original paper). Is that intentional?

...pfrl/agents/acer.py", line 217, in compute_loss_with_kl_constraint assert param.requires_grad

Here is my model definition for error reproduction

    dummy_env = gym.make(args.env)
    obs_size = dummy_env.observation_space.shape[0]
    action_size = dummy_env.action_space.shape[0]
    hidden_size = 256

    head = acer.ACERContinuousActionHead(
        pi=nn.Sequential(
            nn.Linear(hidden_size, action_size),
            pfrl.policies.GaussianHeadWithFixedCovariance(scale=0.3),
        ),
        v=nn.Sequential(
            nn.Linear(hidden_size, 1),
        ),
        adv=nn.Sequential(
            ConcatObsAndAction(),
            nn.Linear(hidden_size + action_size, 1),
        ),
    )

    model = nn.Sequential(
        nn.Linear(obs_size, hidden_size),
        nn.ReLU(),
        nn.Linear(hidden_size, hidden_size),
        nn.ReLU(),
        nn.Linear(hidden_size, hidden_size),
        nn.ReLU(),
        head,
    )

    model.apply(init_chainer_default)

    opt = pfrl.optimizers.SharedRMSpropEpsInsideSqrt(
        model.parameters(), lr=10e-5, eps=4e-3, alpha=0.99
    )

@muupan
Copy link
Member

muupan commented May 26, 2021

It seems like a bug in ACER. Can you try commenting out assert param.requires_grad and see if it works?

@muupan muupan self-assigned this May 26, 2021
@PKramek
Copy link
Author

PKramek commented May 26, 2021

I tried that and this causes the program to throw another exception:

  File ".../venv/lib/python3.7/site-packages/pfrl/agents/acer.py", line 448, in compute_one_step_pi_loss
    delta=self.trust_region_delta,
  File ".../venv/lib/python3.7/site-packages/pfrl/agents/acer.py", line 222, in compute_loss_with_kl_constraint
    [original_loss], distrib_params, retain_graph=True
  File ".../venv/lib/python3.7/site-packages/torch/autograd/__init__.py", line 157, in grad
    inputs, allow_unused)
RuntimeError: One of the differentiated Tensors does not require grad

@PKramek
Copy link
Author

PKramek commented May 26, 2021

Maybe Acer is trying to update its parameters before any data was propagated through its neural networks and it causes that exception.

@muupan
Copy link
Member

muupan commented May 26, 2021

Thank you for confirming that. It seems like ACER does not work with GaussianHeadWithFixedCovariance for now. I think this needs to be fixed.

A possible workaround for it would be GaussianHeadWithStateIndependentCovariance with var_param's learning rate set to 0.

@PKramek
Copy link
Author

PKramek commented May 26, 2021

Thank you for a workaround idea, would it be possible for You to create a fix in the nearest future?

@PKramek
Copy link
Author

PKramek commented May 26, 2021

And I dont see learning rate used anywhere in GaussianHeadWithStateIndependentCovariance but from what I understand I could just pass a callable as var_func, which always returns desired variance value

@muupan
Copy link
Member

muupan commented May 26, 2021

I meant the learning rate you set when you make your optimizer, since PyTorch's optimizer allows setting parameter-specific learning rates. I guess you are right about var_func. Any var_func that returns a tensor whose requires_grad is True would be sufficient as well.

I will try reproducing the issue myself and hopefully fix soon.

@PKramek
Copy link
Author

PKramek commented May 26, 2021

Could you point me out to some examples of how could I use separate PyTorch optimizers or rather just learning rates for my NNs in pfrl. In all of the provided examples, where there are multiple optimizers, they are passed to the training function as separate parameters. What If I wanted to set a different learning rate for my actor, and a different learning rate for my critic and at the same time set learning rate for just GaussianHeadWithStateIndependentCovariance layer in actor. Currently, I am using pfrl.optimizers.SharedRMSpropEpsInsideSqrt but it does not offer learning rate parametrization beyond setting just single value for all NNs

@muupan
Copy link
Member

muupan commented May 26, 2021

Here is PyTorch's official doc: https://pytorch.org/docs/stable/optim.html#per-parameter-options pfrl.optimizers.SharedRMSpropEpsInsideSqrt is a subclass of torch.optim.Optimizer so it is no different.

@muupan
Copy link
Member

muupan commented May 27, 2021

I hope #145 will resolve the issue.

@PKramek
Copy link
Author

PKramek commented May 27, 2021 via email

@muupan
Copy link
Member

muupan commented May 28, 2021

It is tested on a toy env with continuous actions, but it is not verified that it can reproduce the performance on the continuous tasks in the paper.

Here is a sample script to train ACER on OpenAI Gym MuJoCo envs. It seems to work to some extent. It is not tuned much and I cannot guarantee that the hyperparameters etc. are same as those used in the paper.

https://github.com/muupan/pfrl/blob/acer-continous-example-tune/examples/mujoco/train_acer_mujoco.py

command (with #145 applied):

python3 examples/mujoco/train_acer_mujoco.py --num-envs 8 --steps 3000000 --env HalfCheetah-v2

training log:
scores.txt

video with the final model:

openaigym.video.1.62730.video000000.mp4

@PKramek
Copy link
Author

PKramek commented May 28, 2021

That really helped me, thank You very much for all the time You spend debugging code and helping me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants