Error in entry for 更 #2

bahducoup · 2022-11-24T05:00:48Z

更 kæŋ¹ kaːŋ˥ - kaŋ˨˦ t͡ɕĩŋ˩˩ kɤŋ˥/t͡ɕiŋ˥/tʰxɤ tʰimɤ sənsɤ kĩ˥/kẽ˥ - -

When this row is split on '\t', kɤŋ˥/t͡ɕiŋ˥/tʰxɤ tʰimɤ sənsɤ is treated as a single token.
The tʰimɤ sənsɤ portion of this token seems to be erroneously included in the row.

I think it would be a good idea to check why these characters were included in the dataset and verify that there are no similar errors in the rest of the dataset.

The text was updated successfully, but these errors were encountered:

kalvinchang · 2022-11-24T05:22:59Z

this shouldnt affect the reconstruction cuz in the dataloader, we take the first pronunciation variant (kɤŋ˥ in this case)

thanks for catching this tho!!

kalvinchang · 2022-11-24T05:24:20Z

the issue is that the romanized version (the pre-parsed version on Wiktionary) shows something like this
"gēng/jīng/the time sense” for Mandarin

we need to remove extra annotations for Mandarin in the Wiktionary parsing script

bahducoup assigned kalvinchang and bahducoup Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in entry for 更 #2

Error in entry for 更 #2

bahducoup commented Nov 24, 2022

kalvinchang commented Nov 24, 2022

kalvinchang commented Nov 24, 2022

Error in entry for 更 #2

Error in entry for 更 #2

Comments

bahducoup commented Nov 24, 2022

kalvinchang commented Nov 24, 2022

kalvinchang commented Nov 24, 2022