-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EMA trains faster #1
Comments
Thanks for the information. It sounds like a good modification to try! |
Let's leave this open - someone (perhaps I) might want to implement this. |
I've tried on WaveNet+VQVAE with EMA, and it seems that using EMA only on Here is my result.zip, not as good as the official demo, though. The implementation is just follow https://discuss.pytorch.org/t/how-to-apply-exponential-moving-average-decay-for-variables/10856/3 , glad if you find it helps. |
If I am not mistaken, the linked example is a regularization scheme for the model parameters?! That is not the EMA to learn just the embeddings? Do you have an example of your implementation? |
What do you mean by I mean it's the final audio result of my implementation of |
Well, the forum post you linked is not the EMA that is described in the VQVAE paper. Did you modify the implementation to follow the VQVAE paper? |
Actually I think they also use You can see the implementation of sonnet/vqvae.py, I don't really read their implementation of
|
I mean it is called both exponential moving average (EMA). The forum post which you posted and what they do in the VQVAE paper (or in the sonnet code) are fundamentally different things. Even if they are only applied to the embedding parameters. The forum post seems to be a regularization strategy (I am not sure what it aims to do exactly), while the VQVAE EMA is a different learning algorithm for the embeddings. I am not sure if you tried the correct VQVAE EMA learning algorithm, since you wrote that you followed to forum post. |
Yes I follow the forum method of exponential moving average. Sorry I didn't really realize they are different things. To my knowledge |
Cool 👍 Looking forward to the samples 😄 |
I quickly scan the Appendix of the VQVAE paper. It seems they use the method as below:
And update the embedding
What's your idea on that ? |
Yes, sounds correct. You can look at this notebook for a working example. |
Hi,
Cool project!
When I was trying VQVAE I found that using a moving average like described in the appendix trained a lot faster and gave better results! There is a zalandoresearch repo that has a open source example. It is a bit hard to parallelize though, since it does not depend on the optimizer to learn the embedding.
Cheers 👍
The text was updated successfully, but these errors were encountered: