-
Notifications
You must be signed in to change notification settings - Fork 66
Implementation of hinge loss does not match the paper #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for your comment. This is indeed a (small) difference and it should
not affect performance in any significant way. I expect that one could show
that both formulations are equivalent for a system where the probability of
visiting any state in the MDP is the same (in expectation) -- which should be
the case for the randomly initialized block pushing tasks and the physics
simulation). For the Atari games this assumption is violated however and there
could be (minor) differences in performance between either of the two
implementations, but I would expect them to be smaller than the variance
you get from the parameter initialization. If you happen to test this
experimentally, please share your results here :)
…On Sun, Feb 7, 2021 at 11:17 AM Alex ***@***.***> wrote:
Hello Kipf, I find there is a discrepancy between the loss mentioned in
the paper.
According to Eq(5) in paper, for negative samples, you calculate the
Euclidean distance between negative state sample at timestep t and state
at timestep t+1.
However, in the code below, state and neg_state are both at timestep t.
self.neg_loss = torch.max(
zeros, self.hinge - self.energy(
state, action, neg_state, no_trans=True)).mean()
I noticed that the same question was also asked here
<#3>.
I want to know if this is a bug ? Does the discrepancy affect the final
performance ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYBYYCJMHLDCPCGIYKGDL3S5ZSCBANCNFSM4XHJKOYQ>
.
|
Hi, thanks for your answer. |
Yes, I believe either the frame_skip between the individual actions (or for
how many frames an action is applied) is not fully deterministic in the gym
environment that our codebase uses. If you happen to have more insight into
this, please feel free to share it here.
…On Wed, Feb 24, 2021 at 4:34 AM Alex ***@***.***> wrote:
Hi, thanks for your answer.
I now understand that that both formulations are equivalent.
But I have another question:
For the Atari games, even if I set the same random seed, multiple
executions of the application behave differently.
According to my understanding, when I set the same seed, multiple calls to
the program will produce the same result.
Are there other nondeterministic operations in the code that cause the
randomness ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYBYYCQ3FYYXDUHQ75IVWLTARXWDANCNFSM4XHJKOYQ>
.
|
Hello Kipf, I find there is a discrepancy between the loss mentioned in the paper.
According to Eq(5) in paper, for negative samples, you calculate the Euclidean distance between negative state sample at timestep
t
and state at timestept+1
.However, in the code below,
state
andneg_state
are both at timestept
.I noticed that the same question was also asked here.
I want to know if this is a bug ? Does the discrepancy affect the final performance ?
The text was updated successfully, but these errors were encountered: