Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Your implementation of S2A is not soundstorm #360

Open
xliu99 opened this issue Nov 21, 2024 · 3 comments
Open

[BUG]: Your implementation of S2A is not soundstorm #360

xliu99 opened this issue Nov 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@xliu99
Copy link

xliu99 commented Nov 21, 2024

Soundstorm is a single model that models each codebook hierarchically. It is not 2 models, in which the first one only models the first codebook, and the second modeling the rest.

@xliu99 xliu99 added the bug Something isn't working label Nov 21, 2024
@jiaqili3
Copy link
Collaborator

Please kindly refer to the audiolm and soundstorm paper for their implementation, which I understand is more than a single model. Thanks!

@xliu99
Copy link
Author

xliu99 commented Nov 25, 2024

Please kindly refer to the audiolm and soundstorm paper for their implementation, which I understand is more than a single model. Thanks!

In the soundstorm paper, they already obtain the semantic tokens from AudioLM. Their AudioLM tokens are equivalent to the T2S model output in MaskGCT. However, their S2A model, which is the soundstorm, is indeed a single model that generates all RVQ layers hierachically using one model. You probably confuse the AudioLM with a model that only generates the first RVQ codebook. That's why you break the S2A into two models.

@HeCheng0625
Copy link
Collaborator

In fact, the reason we used two models was simply that it was easier to debug at the initial experimental stage (we only needed to generate the acoustic token layer to reconstruct speech). We tried using one model, and there was no significant performance drop. I don't think it makes much difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants