Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A possible classifier-free guidance inconsistency - missing log_softmax before interpolation #494

Open
avihu111 opened this issue Oct 9, 2024 · 0 comments

Comments

@avihu111
Copy link

avihu111 commented Oct 9, 2024

Hi, thanks for a great repo!

In the AudioGen paper, the linear interpolation is done on the log probabilities.
image
In the code, however, it is done on the logits:

logits = uncond_logits + (cond_logits - uncond_logits) * self.cfg_coef

If I understand correctly, logits and log_probs are not equivalent, as log_probs must satisfy:
torch.exp(log_probs).sum()==1 which is equivalent to torch.logsumexp(log_probs)=0
To get the log probabilities from logits we need to apply log_softmax, which ensures this property.

uncond_log_probs = torch.log_softmax(uncond_logits, dim=-1)
cond_log_probs = torch.log_softmax(cond_logits, dim=-1)
logits = uncond_log_probs + (cond_log_probs - uncond_log_probs) * self.cfg_coef

Using either logits or log_probs works quite well.
My tests show some benefits for applying log_softmax before interpolation.
Avihu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant