Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_clean_text is returning invalid symbol #9

Open
eschmidbauer opened this issue Nov 22, 2023 · 6 comments
Open

_clean_text is returning invalid symbol #9

eschmidbauer opened this issue Nov 22, 2023 · 6 comments

Comments

@eschmidbauer
Copy link

Hi, thanks for sharing this project!
I noticed a small issue when running:
python generate_data_statistics.py -i conf.yaml
Im getting an undefined symbol exception error.
When i added debug info, i found that the function _clean_text() is inserting Unicode Decimal Code ̃
I checked my dataset, and Unicode Decimal Code ̃ is not in the dataset anywhere
I added these lines to confirm the issue was coming from clean_text = _clean_text(text, cleaner_names)
image

Any help would be appreciated, thanks!

@zidsi
Copy link

zidsi commented Nov 22, 2023

Depending on defined cleaners in your conf.yaml phonemizer might bi inserting combining tilde to text converted from characters to phonemes.

@p0p4k
Copy link
Owner

p0p4k commented Nov 22, 2023

Also, I am debating whether normalizing the mel-spec is even necessary for this project.

@eschmidbauer
Copy link
Author

Can you advise how to use a different text phonemizer?

@p0p4k
Copy link
Owner

p0p4k commented Nov 28, 2023

Different as in, what is your use case?

@eschmidbauer
Copy link
Author

Depending on defined cleaners in your conf.yaml phonemizer might bi inserting combining tilde to text converted from characters to phonemes.

Im curious how to define a different cleaner in the config

@p0p4k
Copy link
Owner

p0p4k commented Nov 28, 2023

This file has all the info about adding new cleaners https://github.com/p0p4k/pflowtts_pytorch/blob/master/pflow/text/cleaners.py
You can get some inspiration from https://github.com/p0p4k/CoquiTTS/tree/dev/TTS/tts/utils/text to modify depending on your use case. Also, the tilde character thing, you can just continue and drop it, might not be a big issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants