[Question] Some questions about the PPO_lag algorithm #348

tjruan · 2024-08-16T04:21:27Z

Required prerequisites

I have read the documentation https://omnisafe.readthedocs.io.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

Hello Omnisafe team, thank you very much for your contribution. I am experiencing some confusion and I hope you can answer it for me, I appreciate it!
The original PPO algorithm uses the CLIP objective function approach. In the documentation it is mentioned that the surrogate loss function for the PPOlag algorithm is：

Does this equation represent an advantage function that combines rewards and costs?
Is this L in PPOlag replacing A(s, a) in PPO?

If so, can you point me to where in the code the clip is in PPOlag?
I would greatly appreciate it if you could answer my questions.

Gaiejj · 2024-08-18T17:29:48Z

Does this equation represent an advantage function that combines rewards and costs?

Yes, this is an objective function that considers both reward advantage and cost advantage simultaneously.

Is this L in PPOlag replacing A(s, a) in PPO?

Sure, it is right.

If so, can you point me to where in the code the clip is in PPOlag?

The implementation of PPOLag simply replaces ( A(s, a) ) with the objective function weighted by the Lagrange multiplier. Therefore, the clipping operation can be found in the PPO implementation of omnisafe, specifically in lines 69 to 76 of the loss calculation function in PPO:

        ratio = torch.exp(logp_ - logp)
        ratio_cliped = torch.clamp(
            ratio,
            1 - self._cfgs.algo_cfgs.clip,
            1 + self._cfgs.algo_cfgs.clip,
        )
        loss = -torch.min(ratio * adv, ratio_cliped * adv).mean()
        loss -= self._cfgs.algo_cfgs.entropy_coef * distribution.entropy().mean()

tjruan added the question Further information is requested label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Some questions about the PPO_lag algorithm #348

[Question] Some questions about the PPO_lag algorithm #348

tjruan commented Aug 16, 2024

Gaiejj commented Aug 18, 2024

[Question] Some questions about the PPO_lag algorithm #348

[Question] Some questions about the PPO_lag algorithm #348

Comments

tjruan commented Aug 16, 2024

Required prerequisites

Questions

Gaiejj commented Aug 18, 2024