Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are Facodec and Ns3_facodec different? #342

Open
ndhuynh02 opened this issue Nov 10, 2024 · 0 comments
Open

Why are Facodec and Ns3_facodec different? #342

ndhuynh02 opened this issue Nov 10, 2024 · 0 comments

Comments

@ndhuynh02
Copy link

ndhuynh02 commented Nov 10, 2024

I am looking at that Model code of 2 folders facodec and ns3_facodec. I know that ns3_facodec is the training code for Facodec. However, I am witnessing some differences between 2 architecture:

  • First of all, there are no LSTMs in the official Facodec in both Encoder and Decoder
  • Secondly, the timbre encoder is kinda different. Even though both are using Transformer, I am seeing that they are not the same.
  • The generator loss is the combination of multiple losses by some weights. But as I look at the NaturalSpeech3 paper at the Appendix part, it is clearly that the weights are not like in the paper, rather than the DAC paper
  • The upsample and downsample rates are not the same. For the official Ns3_codec, it is [2, 4, 5, 5] while the other one is [2,4, 8, 8]. This also means the hop_lengths for melspectrogram are 200 and 300, respectively
  • In the training code, the audio data has sampling rate of 24k Hz while the original paper performs on 16k Hz audio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant