Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hiragana conversion issue #1

Open
saikotek opened this issue Oct 14, 2021 · 5 comments
Open

Hiragana conversion issue #1

saikotek opened this issue Oct 14, 2021 · 5 comments

Comments

@saikotek
Copy link

saikotek commented Oct 14, 2021

Hello, I came here after using a great Anki plugin of yours called PitchAccent.
I've noticed the issue when trying to convert pitch pattern to hiragana that it doesn't handle long vowel mark ー properly.
Turns out that it isn't that easy to convert katakana to hiragana because of the fact that there are two ways to make vowel longer. If we would simply try to reverse "ー" character based on the preceding vowel it would make words like せんせえ (if the original data is written as センセー).

It would be the best to reverse the conversion workflow, make accents originally in hiragana and then it would be possible to convert to katakana deterministically, right?
For that you need to have the original data in hiragana but from what I've seen the accent_dict data contains fields only in katakana, perhaps you cut out hiragana fields?

I prefer to use hiragana in pitch pattern so I can simply use that instead of vocab kana field in Anki.
If it's too hard - don't mind it.
Thanks for your hard work. よろしくお願いいたします。

@tatsumoto-ren
Copy link
Member

Hello. The kana conversion module doesn't do anything to the character. It only converts kana characters.
センセー becomes せんせー after conversion which I think is correct.

the accent_dict data contains fields only in katakana

It was originally this way. The pitch accent data used in the add-on was contributed by javdejong back in 2012.

If what you need is converting セー to せえ (and I assume other similar pairs), we could think about how to implement it, but it's not the issue of the kana converter module.

@saikotek
Copy link
Author

I see. I believe then it could implemented by instead of doing to_katakana() conversion, convert kanji to furigana?

@tatsumoto-ren
Copy link
Member

I don't think converting kanji to furigana is necessary. Having a simple dictionary that would map kana pairs would be the most obvious solution.

E.g.:

  • あー:ああ
  • かー:かあ
  • みー:みい
  • せー:せえ

etc.

@saikotek
Copy link
Author

saikotek commented Oct 14, 2021

Yeah but is "ー" always used to mark long vowel in おう、えい and not in おお、いい、ええ?

@tatsumoto-ren
Copy link
Member

Hard to tell. We need a set of examples to draw a conclusion on how to do the conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants