You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking at that Model code of 2 folders facodec and ns3_facodec. I know that ns3_facodec is the training code for Facodec. However, I am witnessing some differences between 2 architecture:
First of all, there are no LSTMs in the official Facodec in both Encoder and Decoder
Secondly, the timbre encoder is kinda different. Even though both are using Transformer, I am seeing that they are not the same.
The generator loss is the combination of multiple losses by some weights. But as I look at the NaturalSpeech3 paper at the Appendix part, it is clearly that the weights are not like in the paper, rather than the DAC paper
The upsample and downsample rates are not the same. For the official Ns3_codec, it is [2, 4, 5, 5] while the other one is [2,4, 8, 8]. This also means the hop_lengths for melspectrogram are 200 and 300, respectively
In the training code, the audio data has sampling rate of 24k Hz while the original paper performs on 16k Hz audio
The text was updated successfully, but these errors were encountered:
I am looking at that Model code of 2 folders
facodec
andns3_facodec
. I know thatns3_facodec
is the training code for Facodec. However, I am witnessing some differences between 2 architecture:The text was updated successfully, but these errors were encountered: