Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing phoneme based dataset. #5

Open
ahmedalbahnasawi opened this issue Apr 28, 2023 · 1 comment
Open

Preparing phoneme based dataset. #5

ahmedalbahnasawi opened this issue Apr 28, 2023 · 1 comment

Comments

@ahmedalbahnasawi
Copy link

ahmedalbahnasawi commented Apr 28, 2023

i'm dealing with Arabic text mapped to phoneme using my grapheme to phonemes model
eg: این مخزن شامل نمونه mapped to ' E N - M KH Z N - SH AE M L - N M W N HH '.
my phonemes list is the following: pho_ids = {'-':0, ' ZH':1, 'AE':2, 'SS':3, 'AE':4,'IY':5,.....,'eos': 55} where i have two letters representing one phoneme.

character_config=CharactersConfig(
  characters='ءابتثجحخدذرزسشصضطظعغفقلمنهويِپچژکگیآأؤإئ',
  punctuations='!(),-.:;? ̠،؛؟‌<>',
  phonemes='ˈˌːˑpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟaegiouwy',
  pad="<PAD>",
  eos="<EOS>",
  bos="<BOS>",
  blank="<BLNK>",
  characters_class="TTS.tts.utils.text.characters.IPAPhonemes",
  )

I want to fix character_config to make it suits my experiment.
Many thanks

@karim23657
Copy link
Owner

I think , first you should phonemize all your dataset texts , then train model with a simple character config without phonemizer.
also edit L51 use_phonemes=False,
CharactersConfig without phonemes , and characters based on dataset characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants