Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entire word preprocessing #49

Open
hadibus opened this issue Apr 6, 2019 · 0 comments
Open

Entire word preprocessing #49

hadibus opened this issue Apr 6, 2019 · 0 comments

Comments

@hadibus
Copy link

hadibus commented Apr 6, 2019

I would like to transliterate between English and Anglo Saxon runes. The problem I'm running into is with the letter ᚦ (thorn), which corresponds to the sound "th" in "thistle"

Most of the time the "th <=> ᚦ" transliteration is good, but with some words ᛏᚻ would be more apropriate such as:

  • apartheid
  • fainthearted
  • foothold
  • knighthood
  • knothole
  • nighthawk
  • penthouse
  • warthog

Where the normal "th" pronunciation is ignored.

It would be nice if I could have code like this to have transliteration rules for specific words:

     pre_processor_mapping = {
       u"apartheid"   : u"aparᛏᚻeid",
       u"fainthearted": u"fainᛏᚻearted",
       u"foothold"    : u"fooᛏᚻold",
       u"knighthood"  : u"knighᛏᚻood",
       u"knothole"    : u"knoᛏᚻole",
       u"nighthawk"   : u"nighᛏᚻawk",
       u"penthouse"   : u"penᛏᚻouse",
       u"warthog"     : u"warᛏᚻog",
       u"th": u"ᚦ",
       u"sh": u"ᛇ",
       u"ng": u"ᛝ",
       u"st": u"ᛥ",
       u"qu": u"ᛢ",
       u"ea": u"ᛠ",
       u"io": u"ᛡ",
       u"æ":  u"ᚫ",
       u"œ":  u"ᛟ"
     }

This along with my mapping currently yields "warthog => ᚹᚪᚱᚦᚩᚷ" instead of "warthog => ᚹᚪᚱᛏᚻᚩᚷ".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant