Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing characters in Latin -> Cyrillic transliteration #14

Open
skoval00 opened this issue Mar 14, 2016 · 3 comments
Open

Missing characters in Latin -> Cyrillic transliteration #14

skoval00 opened this issue Mar 14, 2016 · 3 comments

Comments

@skoval00
Copy link

Characters C, Q, W, X are missing in transliteration tables for Latin -> Cyrillic transliteration.
$ python -c 'import transliterate; print transliterate.translit("CQWX", "ru")'
CQWX

@barseghyanartur
Copy link
Owner

@skoval00, @SomeUser55:

The main question is, what to do with them?

https://en.wikipedia.org/wiki/Romanization_of_Russian#Transliteration_table

  • One way of dealing with it is to make strict argument strip out all characters that are not listed in the language pack.
  • Another way of dealing with it, since we want to follow the standards, would be to allow users to provide additional mappings to the translit function (which would then take precedence over chosen language pack).

Any other/better ideas?

@skoval00
Copy link
Author

skoval00 commented Sep 26, 2018

In my previous company some guys made a complex tool for multi-language transliteration of toponyms. As far as I understood they had sets of rules with different probabilities which depended on the location of characters' sequences in a word (like ck -> к, ough -> о). The process of transliteration had a few stages implemented in different programming languages, but result was quite good.
Unfortunately it was rather ugly and hardly separable from the main codebase.

@hadaev8
Copy link

hadaev8 commented Oct 5, 2020

Same here:
text = 'Когда Digital Equipment Corporation сократила количество рабочих мест на три тысячи, в ее официальном объявлении говорилось не об увольнениях, а о «вынужденных мерах».'

->

Когда Дигитал Еqуипмент Цорпоратион сократила количество рабочих мест на три тысячи, в ее официальном объявлении говорилось не об увольнениях, а о «вынужденных мерах».

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants