Why are Facodec and Ns3_facodec different? #342

ndhuynh02 · 2024-11-10T02:29:42Z

I am looking at that Model code of 2 folders facodec and ns3_facodec. I know that ns3_facodec is the training code for Facodec. However, I am witnessing some differences between 2 architecture:

First of all, there are no LSTMs in the official Facodec in both Encoder and Decoder
Secondly, the timbre encoder is kinda different. Even though both are using Transformer, I am seeing that they are not the same.
The generator loss is the combination of multiple losses by some weights. But as I look at the NaturalSpeech3 paper at the Appendix part, it is clearly that the weights are not like in the paper, rather than the DAC paper
The upsample and downsample rates are not the same. For the official Ns3_codec, it is [2, 4, 5, 5] while the other one is [2,4, 8, 8]. This also means the hop_lengths for melspectrogram are 200 and 300, respectively
In the training code, the audio data has sampling rate of 24k Hz while the original paper performs on 16k Hz audio

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are Facodec and Ns3_facodec different? #342

Why are Facodec and Ns3_facodec different? #342

ndhuynh02 commented Nov 10, 2024 •

edited

Loading

Why are Facodec and Ns3_facodec different? #342

Why are Facodec and Ns3_facodec different? #342

Comments

ndhuynh02 commented Nov 10, 2024 • edited Loading

ndhuynh02 commented Nov 10, 2024 •

edited

Loading