Release nagisa v0.2.8 · taishi-i/nagisa

nagisa 0.2.8 incorporates the following changes:

Fix AttributeError in nagisa_utils.pyx when tokenizing a text containing Latin capital letter I with dot above 'İ'

When tokenizing a text containing 'İ', an AttributeError has occurred. This is because, as the following example shows, lowering 'İ' would have changed to the length of 2, and would not have been extracting features correctly.

>>> text = "İ" # [U+0130]
>>> print(len(text))
1
>>> text = text.lower() # [U+0069] [U+0307]
>>> print(text)
'i̇'
>>> print(len(text))
2

To avoid this error, the following preprocess was added to the source code modification 1, modification 2.

text = text.replace('İ', 'I')

Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10, 3.11) to PyPI for Linux
Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10) to PyPI for macOS
Add Python wheels (3.6, 3.7, 3.8) to PyPI for Windows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nagisa v0.2.8