Skip to content

Commit

Permalink
Merge branch 'release/0.1.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
emfomy committed Nov 18, 2020
2 parents 57a538f + 59c7b82 commit efdd15d
Show file tree
Hide file tree
Showing 17 changed files with 1,527 additions and 14 deletions.
3 changes: 3 additions & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
[MASTER]

disable =
duplicate-code,
logging-fstring-interpolation,
too-few-public-methods,
too-many-locals,

[FORMAT]

Expand Down
674 changes: 674 additions & 0 deletions COPYING

Large diffs are not rendered by default.

16 changes: 12 additions & 4 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
Copyright (c) 2020 CKIP Lab.

This work is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or send
a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
210 changes: 209 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,212 @@
CKIP Transformers
-----------------

UNDER CONSTRUCTION...
This open-source library implements CKIP Chinese NLP tools using transformers models.

* (WS) Word Segmentation
* (POS) Part-of-Speech Tagging
* (NER) Named Entity Recognition

Git
^^^

https://github.com/emfomy/ckip-transformers

|GitHub Version| |GitHub Release| |GitHub Issues|

.. |GitHub Version| image:: https://img.shields.io/github/v/release/emfomy/ckip-transformers.svg?maxAge=3600
:target: https://github.com/emfomy/ckip-transformers/releases

.. |GitHub License| image:: https://img.shields.io/github/license/emfomy/ckip-transformers.svg?maxAge=3600
:target: https://github.com/emfomy/ckip-transformers/blob/master/LICENSE

.. |GitHub Release| image:: https://img.shields.io/github/release-date/emfomy/ckip-transformers.svg?maxAge=3600

.. |GitHub Downloads| image:: https://img.shields.io/github/downloads/emfomy/ckip-transformers/total.svg?maxAge=3600
:target: https://github.com/emfomy/ckip-transformers/releases/latest

.. |GitHub Issues| image:: https://img.shields.io/github/issues/emfomy/ckip-transformers.svg?maxAge=3600
:target: https://github.com/emfomy/ckip-transformers/issues

.. |GitHub Forks| image:: https://img.shields.io/github/forks/emfomy/ckip-transformers.svg?style=social&label=Fork&maxAge=3600

.. |GitHub Stars| image:: https://img.shields.io/github/stars/emfomy/ckip-transformers.svg?style=social&label=Star&maxAge=3600

.. |GitHub Watchers| image:: https://img.shields.io/github/watchers/emfomy/ckip-transformers.svg?style=social&label=Watch&maxAge=3600

PyPI
^^^^

https://pypi.org/project/ckip-transformers

|PyPI Version| |PyPI License| |PyPI Downloads| |PyPI Python| |PyPI Implementation| |PyPI Status|

.. |PyPI Version| image:: https://img.shields.io/pypi/v/ckip-transformers.svg?maxAge=3600
:target: https://pypi.org/project/ckip-transformers

.. |PyPI License| image:: https://img.shields.io/pypi/l/ckip-transformers.svg?maxAge=3600
:target: https://github.com/emfomy/ckip-transformers/blob/master/LICENSE

.. |PyPI Downloads| image:: https://img.shields.io/pypi/dm/ckip-transformers.svg?maxAge=3600
:target: https://pypi.org/project/ckip-transformers#files

.. |PyPI Python| image:: https://img.shields.io/pypi/pyversions/ckip-transformers.svg?maxAge=3600

.. |PyPI Implementation| image:: https://img.shields.io/pypi/implementation/ckip-transformers.svg?maxAge=3600

.. |PyPI Format| image:: https://img.shields.io/pypi/format/ckip-transformers.svg?maxAge=3600

.. |PyPI Status| image:: https://img.shields.io/pypi/status/ckip-transformers.svg?maxAge=3600

Documentation
^^^^^^^^^^^^^

https://ckip-transformers.readthedocs.io/

|ReadTheDocs Home|

.. |ReadTheDocs Home| image:: https://img.shields.io/website/https/ckip-transformers.readthedocs.io.svg?maxAge=3600&up_message=online&down_message=offline
:target: https://ckip-transformers.readthedocs.io

Relative Demos / Packages
^^^^^^^^^^^^^^^^^^^^^^^^^

- `CkipTagger <https://github.com/ckiplab/ckiptagger>`_: An alternative Chinese NLP library with using BiLSTM.
- `CKIP CoreNLP Toolkit <https://github.com/ckiplab/ckipnlp>`_: A Chinese NLP library with more NLP tasks and utilities.

Contributers
^^^^^^^^^^^^

* `Mu Yang <https://muyang.pro>`__ at `CKIP <https://ckip.iis.sinica.edu.tw>`__ (Author & Maintainer)
* `Wei-Yun Ma <https://www.iis.sinica.edu.tw/pages/ma/>`__ at `CKIP <https://ckip.iis.sinica.edu.tw>`__ (Maintainer)

Installation
------------

``pip install -U ckip-transformers``

Requirements:

* `Python <https://www.python.org>`__ 3.6+
* `PyTorch <https://pytorch.org>`__ 1.1+
* `HuggingFace Transformers <https://huggingface.co/transformers/>`__ 3.5+

Installation via Pip
^^^^^^^^^^^^^^^^^^^^

``pip install -U ckip-transformers``

Usage
-----

See https://ckip-transformers.readthedocs.io/en/latest/_api/ckip_transformers.html for API details.

The complete script of this example is https://github.com/ckiplab/ckip-transformers/blob/master/example/example.py.

1. Import module
^^^^^^^^^^^^^^^^

.. code-block:: python
from ckip_transformers.nlp import CkipWordSegmenter, CkipPosTagger, CkipNerChunker
2. Load models
^^^^^^^^^^^^^^

.. code-block:: python
# Initialize drivers
ws_driver = CkipWordSegmenter()
pos_driver = CkipPosTagger()
ner_driver = CkipNerChunker()
3. Run pipeline
^^^^^^^^^^^^^^^

- The input for word segmentation and named-entity recognition must be a list of sentences.
- The input for part-of-speech tagging must be a list of list of words (the output of word segmentation).

.. code-block:: python
# Input text
text = [
'傅達仁今將執行安樂死,卻突然爆出自己20年前遭緯來體育台封殺,他不懂自己哪裡得罪到電視台。',
'美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會,預料她將會很順利通過參議院支持,成為該國有史以來第一位的華裔女性內閣成員。',
]
# Run pipeline
ws = ws_driver(text)
pos = pos_driver(ws)
ner = ner_driver(text)
4. Show results
^^^^^^^^^^^^^^^

.. code-block:: python
# Pack word segmentation and part-of-speech results
def pack_ws_pos_sentece(sentence_ws, sentence_pos):
assert len(sentence_ws) == len(sentence_pos)
res = []
for word_ws, word_pos in zip(sentence_ws, sentence_pos):
res.append(f'{word_ws}({word_pos})')
return '\u3000'.join(res)
# Show results
for sentence, sentence_ws, sentence_pos, sentence_ner in zip(text, ws, pos, ner):
print(sentence)
print(pack_ws_pos_sentece(sentence_ws, sentence_pos))
for entity in sentence_ner:
print(entity)
print()
.. code-block:: text
傅達仁今將執行安樂死,卻突然爆出自己20年前遭緯來體育台封殺,他不懂自己哪裡得罪到電視台。
傅達仁(Nb) 今(Nd) 將(D) 執行(VC) 安樂死(Na) ,(COMMACATEGORY) 卻(D) 突然(D) 爆出(VJ) 自己(Nh) 20(Neu) 年(Nf) 前(Ng) 遭(P) 緯來(Nb) 體育台(Na) 封殺(VC) ,(COMMACATEGORY) 他(Nh) 不(D) 懂(VK) 自己(Nh) 哪裡(Ncd) 得罪到(VC) 電視台(Nc) 。(PERIODCATEGORY)
NerToken(word='傅達仁', ner='PERSON', idx=(0, 3))
NerToken(word='今', ner='DATE', idx=(3, 4))
NerToken(word='20年', ner='DATE', idx=(18, 21))
NerToken(word='緯來體育台', ner='ORG', idx=(23, 28))
美國參議院針對今天總統布什所提名的勞工部長趙小蘭展開認可聽證會,預料她將會很順利通過參議院支持,成為該國有史以來第一位的華裔女性內閣成員。
美國(Nc) 參議院(Nc) 針對(P) 今天(Nd) 總統(Na) 布什(Nb) 所(D) 提名(VC) 的(DE) 勞工部長(Na) 趙小蘭(Nb) 展開(VC) 認可(VC) 聽證會(Na) ,(COMMACATEGORY) 預料(VE) 她(Nh) 將(D) 會(D) 很(Dfa) 順利(VH) 通過(VC) 參議院(Nc) 支持(VC) ,(COMMACATEGORY) 成為(VG) 該(Nes) 國(Nc) 有史以來(D) 第一(Neu) 位(Nf) 的(DE) 華裔(Na) 女性(Na) 內閣(Na) 成員(Na) 。(PERIODCATEGORY)
NerToken(word='美國參議院', ner='ORG', idx=(0, 5))
NerToken(word='今天', ner='LOC', idx=(7, 9))
NerToken(word='布什', ner='PERSON', idx=(11, 13))
NerToken(word='勞工部長', ner='ORG', idx=(17, 21))
NerToken(word='趙小蘭', ner='PERSON', idx=(21, 24))
NerToken(word='認可聽證會', ner='EVENT', idx=(26, 31))
NerToken(word='參議院', ner='ORG', idx=(42, 45))
NerToken(word='第一', ner='ORDINAL', idx=(56, 58))
NerToken(word='華裔', ner='NORP', idx=(60, 62))
Pretrained Models
-----------------

One may also use our pretrained models with HuggingFace transformers library directly: https://huggingface.co/ckiplab/.

Pretrained Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^

* `ALBERT Tiny <https://huggingface.co/ckiplab/albert-tiny-chinese>`_
* `ALBERT Base <https://huggingface.co/ckiplab/albert-base-chinese>`_
* `BERT Base <https://huggingface.co/ckiplab/bert-base-chinese>`_
* `GPT2 Base <https://huggingface.co/ckiplab/gpt2-base-chinese>`_

NLP Task Models
^^^^^^^^^^^^^^^

* `BERT Base — Word Segmentation <https://huggingface.co/ckiplab/bert-base-chinese-ws>`_
* `BERT Base — Part-of-Speech Tagging <https://huggingface.co/ckiplab/bert-base-chinese-pos>`_
* `BERT Base — Named-Entity Recognition <https://huggingface.co/ckiplab/bert-base-chinese-ner>`_

License
-------

|GPL-3.0|

Copyright (c) 2020 `CKIP Lab <https://ckip.iis.sinica.edu.tw>`__ under the `GPL-3.0 License <https://www.gnu.org/licenses/gpl-3.0.html>`__.

.. |GPL-3.0| image:: https://www.gnu.org/graphics/gplv3-with-text-136x68.png
:target: https://www.gnu.org/licenses/gpl-3.0.html
4 changes: 2 additions & 2 deletions ckip_transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
__copyright__ = '2020 CKIP Lab'

__title__ = 'CKIP Transformers'
__version__ = '0.0.0'
__version__ = '0.1.0'
__description__ = 'CKIP Transformers'
__license__ = 'CC BY-NC-SA 4.0'
__license__ = 'GPL-3.0'

__url__ = 'https://ckip-transformers.readthedocs.io'
__download_url__ = __url__+'/tarball/'+__version__
16 changes: 16 additions & 0 deletions ckip_transformers/nlp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/usr/bin/env python3
# -*- coding:utf-8 -*-

"""
This module provides the CKIP Transformers NLP drivers.
"""

__author__ = 'Mu Yang <http://muyang.pro>'
__copyright__ = '2020 CKIP Lab'
__license__ = 'GPL-3.0'

from .driver import (
CkipWordSegmenter,
CkipPosTagger,
CkipNerChunker,
)
Loading

0 comments on commit efdd15d

Please sign in to comment.