Skip to content

nagisa v0.2.9

Compare
Choose a tag to compare
@taishi-i taishi-i released this 30 Jul 16:53
· 18 commits to master since this release

nagisa 0.2.9 incorporates the following changes:

  • Improve the bottleneck in part-of-speech tagging caused by 'list and append', problem resolved by using 'set and add'
    nagisa_v0 2 9

Until now, there was an issue where the processing time would slow down as the results analyzed by the following code increased in tagger.py.

tids = []
for w in words:
    if w in self._word2postags:
        w2p = self._word2postags[w]
    else:
        w2p = [0] 
    if self.use_noun_heuristic is True:
        if w.isalnum() is True:
            if w2p == [0]:
                w2p = [self._pos2id[u'名詞']] 
            else:
                # bottleneck is here!
                w2p.append(self._pos2id[u'名詞']) 
    w2p = list(set(w2p))
    tids.append(w2p)

By changing to the following code, we have resolved the issue of the processing slowing down.

tids = []
for w in words:
    w2p = set(self._word2postags.get(w, [0]))
    if self.use_noun_heuristic and w.isalnum():
        if 0 in w2p:
            w2p.remove(0)
        w2p.add(2)  # nagisa.tagger._pos2id["名詞"] = 2 
    tids.append(list(w2p))
  • Fix dash-separated 'description-file' error in setup.cfg to use 'description_file' in setup.cfg
[metadata]
description_file = README.md
  • Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10, 3.11) to PyPI for Linux
  • Add Python wheels (3.6, 3.7, 3.8, 3.9, 3.10) to PyPI for macOS
  • Add Python wheels (3.6, 3.7, 3.8) to PyPI for Windows