Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot reproduce Zamia lexicon.txt entries #2

Open
fquirin opened this issue Jun 28, 2021 · 3 comments
Open

cannot reproduce Zamia lexicon.txt entries #2

fquirin opened this issue Jun 28, 2021 · 3 comments

Comments

@fquirin
Copy link

fquirin commented Jun 28, 2021

Hi Michael,

I've been experimenting with gruut-ipa and Zamia lexicon.txt from the popular models 'kaldi-generic-de-tdnn_250-r20190328' and ''kaldi-generic-en-tdnn_250-r20190609" and I'm having trouble getting the expected results.
As far as I understand Zamias lexicon.txt: is in sampa format, so I selected two words from the German file:

hallo --> h '{ l @ U
welt --> v 'E l t

and then used espeak-ng to generate phonems:

# espeak IPA phonems:
espeak-ng -v de -x -q --sep=" " --ipa "hallo"
h ˈa l oː
espeak-ng -v de -x -q --sep=" " --ipa "welt"
v ˈɛ l t

# espeak default phonems:
h 'a l o:
v 'E l t

Finally I've tried to convert the espeak results to sampa with gruut-ipa:

python3 -m gruut_ipa convert ipa sampa "h ˈa l oː"
h "a l o:

python3 -m gruut_ipa convert espeak sampa "h 'a l o:"
h "a 5 o:

python3 -m gruut_ipa convert ipa sampa "v ˈɛ l t"
v "E l t

python3 -m gruut_ipa convert espeak sampa "v 'E l t"
v "E 5 t

But none of the results matches the lexicon.txt entries.
Any help or hints would be appreciated! :-)

@fquirin
Copy link
Author

fquirin commented Jun 30, 2021

Actually I'm starting to think that hallo --> h '{ l @ U in the German Zamia lexicon is just wrong and referring to the English pronunciation 😅 since related words are for example hallodri --> h a l 'o: d R i: (almost identical to German hallo) and halloween --> h '{ l @ w i n (the English hallo).
This would make the espeak IPA to SAMPA pipeline with gruut almost correct except for the apostrophe.

[EDIT]
To be honest I'm confused about what's the correct symbol here. According to Wikipedia and this converter ˈ (unicode U+02C8) is " (unicode U+0022) in X-SAMPA, but I don't see it anywhere in the Zamia Kaldi lexicon. X-SAMPA seems to have different flavors 🙈 (Conlang X-SAMPA (CXS))

[EDIT2]
Guenter himself seems to use this mapping

@synesthesiam
Copy link
Contributor

Guenter's phonemes seem to be like X-SAMPA, but not quite it. I have a English map for Zamia, but I will need to add a German map too 👍

@fquirin
Copy link
Author

fquirin commented Jul 4, 2021

We had some discussions about it, maybe it helps ^^: gooofy/zamia-speech#79

My conclusion was that I kind of need a manual check:

  • generate phonemes with espeak (IPA)
  • replace some stuff like " to ' (roughly Guenters IPA Normalization map)
  • convert to X-SAMPA with gruut IPA
  • check against Zamia lexicon phonemes if result is ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants