Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset #2

Open
VafaKnm opened this issue Apr 2, 2024 · 3 comments
Open

dataset #2

VafaKnm opened this issue Apr 2, 2024 · 3 comments

Comments

@VafaKnm
Copy link

VafaKnm commented Apr 2, 2024

Hi!
I have a question about dataset. Suppose that I have several wavs and the corresponding text files of them (written in Persian language). How I can create phoneme_transcriptions of them?

@Adibian
Copy link
Owner

Adibian commented Apr 2, 2024

Hi!
Based on my experiments, for training any of TTS model for Persian language you need audio files and their phoneme sequences. You can not use raw Persian text and if you do, the result will not be good because of not written short vowels and Kasre_Ezafe.
Also creating phonemes from Persian text is not a simple task because it needs large lexicon, Grapheme_to_Phoneme model (G2P)(for words do not exist in lexicon), Ezafe prediction, and word sense disambiguation model (for words with multiple phonemes like 'mard' and 'mord').
And I don't know if there is any public tool or repository that handel all this problems and create phonemes from text or not.

@VafaKnm
Copy link
Author

VafaKnm commented Apr 3, 2024

Thanks for sharing your experiences my friend.
Actually, I find a G2P model for Persian language:
https://github.com/PasaOpasen/PersianG2P
It's good but not perfect; for example it can't recongnize Kasre between two words ("gol ziba" instead of "gole ziba") but anyway I have no other choice!

I have one more question; Why there is not any spaces between the words? for example what happen if we build dataset like "gole ziba" instead of "goleziba"?

@Adibian
Copy link
Owner

Adibian commented Apr 7, 2024

In the speech synthesis from the phoneme sequence, space is not important. Because you have to separate the phonemes, consider the ID number for each phoneme, and use the sequence of IDs, so the spaces between the words are removed in this process. Of course, you can consider a new token (like other phonemes) for the space between words but note that usually the duration of this phoneme will be zero or very little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants