Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMA trains faster #1

Open
pfriesch opened this issue Mar 2, 2019 · 12 comments
Open

EMA trains faster #1

pfriesch opened this issue Mar 2, 2019 · 12 comments

Comments

@pfriesch
Copy link

pfriesch commented Mar 2, 2019

Hi,

Cool project!

When I was trying VQVAE I found that using a moving average like described in the appendix trained a lot faster and gave better results! There is a zalandoresearch repo that has a open source example. It is a bit hard to parallelize though, since it does not depend on the optimizer to learn the embedding.

Cheers 👍

@mkotha
Copy link
Owner

mkotha commented Mar 3, 2019

Thanks for the information. It sounds like a good modification to try!

@pfriesch pfriesch closed this as completed Mar 3, 2019
@mkotha
Copy link
Owner

mkotha commented Mar 4, 2019

Let's leave this open - someone (perhaps I) might want to implement this.

@mkotha mkotha reopened this Mar 4, 2019
@mazzzystar
Copy link

mazzzystar commented Mar 18, 2019

I've tried on WaveNet+VQVAE with EMA, and it seems that using EMA only onVQ space can get a reasonable result.

Here is my result.zip, not as good as the official demo, though.

The implementation is just follow https://discuss.pytorch.org/t/how-to-apply-exponential-moving-average-decay-for-variables/10856/3 , glad if you find it helps.

@pfriesch
Copy link
Author

If I am not mistaken, the linked example is a regularization scheme for the model parameters?! That is not the EMA to learn just the embeddings? Do you have an example of your implementation?

@mazzzystar
Copy link

What do you mean by the linked example is a regularization scheme for the model parameters?

I mean it's the final audio result of my implementation of VQ-VAE with EMA only on embedding.

@pfriesch
Copy link
Author

Well, the forum post you linked is not the EMA that is described in the VQVAE paper. Did you modify the implementation to follow the VQVAE paper?

@mazzzystar
Copy link

mazzzystar commented Mar 18, 2019

Actually I think they also use EMA only on embedding.

You can see the implementation of sonnet/vqvae.py, I don't really read their implementation of EMA.

The difference between VectorQuantizerEMA and VectorQuantizer is that
this module uses exponential moving averages to update the embedding vectors
instead of an auxiliary loss.

@pfriesch
Copy link
Author

I mean it is called both exponential moving average (EMA). The forum post which you posted and what they do in the VQVAE paper (or in the sonnet code) are fundamentally different things. Even if they are only applied to the embedding parameters. The forum post seems to be a regularization strategy (I am not sure what it aims to do exactly), while the VQVAE EMA is a different learning algorithm for the embeddings.

I am not sure if you tried the correct VQVAE EMA learning algorithm, since you wrote that you followed to forum post.

@mazzzystar
Copy link

Yes I follow the forum method of exponential moving average. Sorry I didn't really realize they are different things. To my knowledge EMA is a regularization strategy, so I did not check the EMA idea of VQVAE. Thanks for your reminder, I'll check the paper tomorrow.

@pfriesch
Copy link
Author

Cool 👍 Looking forward to the samples 😄

@mazzzystar
Copy link

I quickly scan the Appendix of the VQVAE paper.

It seems they use the method as below:
Remove sg(z_ex), e) ** 2 from loss function below

L = loss_recon + sg(z_ex), e) ** 2 + 0.25 * (sg(z_qx), e))

And update the embedding e as below:

  • Suppose currently the embedding center is e_i, there are N encoder outputs closet to it.
  • Then e_i will update itself with EMA.

What's your idea on that ?

@pfriesch
Copy link
Author

Yes, sounds correct. You can look at this notebook for a working example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants