This module identifies languages from ISO639-1 and ISO639-3 codes or ISO names and provides a convenient class to access related attributes. It is also possible to lookup codes for languages based on their ISO names. However, the name based lookup will be slower as all language names are compared in lowercase for that.
Basic flows:
- Code is 2 characters
- Lookup long code
- Lookup data
- Code is 3 characters
- Lookup data
- Code is something else
- Loop all language names to check for match
input_language == language_name.lower()
- Lookup data
- Loop all language names to check for match
This means that that Name based lookup is n-times slower than the other two options. But this really should not make a difference.
The data is taken from iso639-3.sil.org and is stored in the tables folder. Further releases will update these tables.
The code in generator.py generates a single python file that contains all lookup tables and methods.
- code: The ISO639-3 Code
- short: ISO639-1 Code if available
- deprecated: True if the definition is deprecated
- macro: True if this is in a macrolanguage gropup
- parent: The parent macrolanguage
- macros: Any languages belonging to this macrolanguage
Checking the macrolanguages for Chinese:
from languager import get_language
lang = get_language('zho')
# lang = get_language('zh')
# lang = get_language('Chinese')
# lang = get_language('does not exist', default='zho')
for language in lang.macros:
print(language)
# czo
# csp
# yue
# cnp
# cmn
# czh
# hak
# nan
# wuu
# cjy
# lzh
# gan
# mnp
# cpx
# hsn
# cdo