Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in SAC on entropy coefficient update #177

Open
marioyc opened this issue Oct 25, 2022 · 2 comments
Open

Discrepancy in SAC on entropy coefficient update #177

marioyc opened this issue Oct 25, 2022 · 2 comments

Comments

@marioyc
Copy link
Contributor

marioyc commented Oct 25, 2022

Noticed that here the log_prob variable is computed before the udpate of the actor while on SAC's repo it is recomputed after the actor update (the paper also mentions in Section 6 that an update is made on both q-function and policy before the update for the entropy coefficient). By any chance have you compared whether this detail makes a difference?

@muupan
Copy link
Member

muupan commented Oct 25, 2022

You are right, it seems to be a discrepancy from the official implementation. I do not remember whether I made a comparison, maybe not.

@marioyc
Copy link
Contributor Author

marioyc commented Oct 26, 2022

I see, no problem, thanks for replying anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants