-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rhythmizers for other languages #62
Comments
Rhythmizers are actually temporary solution of phoneme duration prediction for MIDI-less models. A rhythmizer contains the FastSpeech2 encoder module and the DurationPredictor module from MIDI-A mode. Models of MIDI-A mode can predict phoneme durations well and generate nice spectrograms, but their datasets are hard to label (you need MIDI sequence and slurs), and they have poor ability to predict the pitch, although they do have PitchPredictor. That is why we are deprecating this mode in this forked repository. To get a rhythmizer, you need to first choose or design a phoneme dictionary. Then you should label your dataset in the opencpop segments format. Please note that the MIDI duration transcriptions of opencpop is in consonant-vowel format, and you need to label your dataset in vowel-consonant format, which is to say, the beginning of note should be aligned with the beginning of vowels instead of consonants (see issue). Here is an example of the labels that we converted from the original opencpop transcriptions: transcriptions-strict-revised2.txt. The last step is to preprocess your dataset and train a MIDI-A model with this config. After that, you can export the part for duration prediction with this script. For CVVC languages like English and Polish, the answer is no. Because we currently can only deal with two-phase (CV) phoneme systems like Chinese and Japanese. MIDI-A, MIDI-B, duration predictors, data labels and all other word-phoneme related stuff will be re-designed in the future, and for that time you can expect a full support to all universal languages. No rhythmizers will be needed then - everyone can train their own variance adaptors (containing duration and pitch models and much more) via standard pipelines as easy as that of preparing and training MIDI-less acoustic models for now. By the way, members of our team are already preparing for a Japanese rhythmizer. When they finish the dictionary, rhythmizer and the MFA model, we will formally support Japanese MIDI-less mode preparation in our pipeline. If you really find difficulties preparing by your own, it is fine to just wait for our progress. |
Will do! Also I should close this... |
Hi again, I was wondering if there were any documentation regarding the rhythmizers, as I would like to train them for Japanese...
Side question, would a rhythmizer work on CVVC languages such as English or Polish?
The text was updated successfully, but these errors were encountered: