Detect Toki Pona in non-latin text #3

gregdan3 · 2024-06-25T14:13:20Z

Currently, the library can only detect Toki Pona in latin-alphabet text and UCSUR; all text in other writing systems is considered to not be Toki Pona, even though it is perfectly reasonable to render Toki Pona in almost any writing system.

To do this as fully as my preferential config for Latin alphabet text, I would need the following per script:

List of words in the dictionary rendered in the target script (each Dictionary filter)
A regex which matches words rendered with appropriate syllables (Syllabic filter)
A list of all the characters in the language which may be used to render (Alphabetic filter)

While the alphabetic filter specifically would be relatively easy (even though it would be improperly named for, say, Japanese), the dictionary and syllabic filters would be challenging for languages which have multiple ways to write approximately the same sound in Toki Pona. For example, I was provided this list for Greek by jan Niwe (@Nerd1729 on Discord):

α = /a/
ε = αι = /e/
η = ι = υ = ει = οι = υι = /i/
γη = γι = γυ = γει = γοι = γυι = /j/
κ = /k/
λ = /l/
μ = /m/
ν = /n/
ο = ω = /o/
π = /p/
σ = /s/
τ = /t/
ου = ȣ = /u/
β = γου = /w/

The text was updated successfully, but these errors were encountered:

gregdan3 added the enhancement New feature or request label Jun 25, 2024

gregdan3 changed the title ~~[Feature] Detecting Toki Pona in non-latin text~~ Detect Toki Pona in non-latin text Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect Toki Pona in non-latin text #3

Detect Toki Pona in non-latin text #3

gregdan3 commented Jun 25, 2024

Detect Toki Pona in non-latin text #3

Detect Toki Pona in non-latin text #3

Comments

gregdan3 commented Jun 25, 2024