Skip to content

Alxblsk/t13n

Repository files navigation

T13N: Transliteration Made Right

Welcome the the T13N - a transliteration library for Cyrilic languages written for Javascript applications.

What is it for?

Transliteration can be made for different purposes, such as:

  • Friendly URLs for Content Management Systems that publish content using Cyrilic languages;
  • Longer SMS;
  • Passport names;
  • Text interpretations for languages that has a dedicated alternative latin alphabets etc.

Status

IN DEVELOPMENT / PHASE 0 PREVIEW

Implementation Checklist

Implementation is split on so-called "phases" for better prioritization.

Phase Zero: Belarusian-To-Latin (BGN/PCGN 1979)

  • Define a basic set of rules for each letter;
  • Define a set of flags calculated for each letter for better context;g
  • Define alternative variations for some letters that require it (like 'г');
  • Support the most basic in-between-words separators (dash, underscore) for URL creation support and resolve "similar" symbols ("’" into "'");
  • Ignore already available latin symbols and digits;
  • Extend configurations via settings;
  • Pack everything as v0.1

Phase One: Other Belarusian-To-Latin variations ("Latinka", ICAO, ISO 9)

  • Switch to Typescript;
  • Schematize a language JSON;
  • Reorganize code to support other variations of one language;
  • Add Belarusian Latin alphabet ("Łacinka");
  • Add ICAO standard;
  • Add ISO 9 standard;

Phase Two: Ukrainian-To-Latin

  • Reorganize code to support multiple languages;
  • Add Ukrainian alphabet and transliteration rules;

Phase Three: Russian-To-Latin

  • Add Russian alphabet and transliteration rules.

(Other languages to be supported later on)

Ruleset & Dictionary

Every transformation rule is explicit and described in a so-called Ruleset It's a compilation of rule that explains transliteration behavior of the script. It may be compact and descriptive at the same time, depending on needs.

A result of Ruleset compilation is a Dictionary, that's used for pre-processing analysis and later transliteration.

There are three types of Rules which can possibly be used:

Rule Type Description
L Describing a rule for a letter that should be altered on a Latin manner
S Every special symbol that should be kept as-is or transformed / corrected
R There are some common sets of characters (like latin letters or digits) that described one after one and should be labeled in the same way

About

A transliteration (t13n) library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published