Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
taishi-i committed Sep 25, 2022
1 parent e4d3622 commit 75087e9
Showing 1 changed file with 75 additions and 1 deletion.
76 changes: 75 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,76 @@
# nagisa_bert
A BERT model for nagisa

[![Python package](https://github.com/taishi-i/nagisa_bert/actions/workflows/python-package.yml/badge.svg)](https://github.com/taishi-i/nagisa_bert/actions/workflows/python-package.yml)
[![PyPI version](https://badge.fury.io/py/nagisa_bert.svg)](https://badge.fury.io/py/nagisa_bert)

This library provides a tokenizer to use [a Japanese BERT model](https://huggingface.co/taishi-i/nagisa_bert) for [nagisa](https://github.com/taishi-i/nagisa).
The model is available in [Transformers](https://github.com/huggingface/transformers) πŸ€—.

## Install

Python 3.7+ on Linux or macOS is required.
You can install *nagisa_bert* by using the *pip* command.


```bash
$ pip install nagisa_bert
```

## Usage

This model is available in Transformer's pipeline method.

```python
>>> from transformers import pipeline
>>> from nagisa_bert import NagisaBertTokenizer

>>> text = "nagisaで[MASK]できるヒデルです"
>>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
>>> fill_mask = pipeline("fill-mask", model='taishi-i/nagisa_bert', tokenizer=tokenizer)
>>> print(fill_mask(text))
[{'score': 0.1385931372642517,
'sequence': 'nagisa で 使用 できる ヒデル です',
'token': 8092,
'token_str': 'δ½Ώ 用'},
{'score': 0.11947669088840485,
'sequence': 'nagisa で εˆ©η”¨ できる ヒデル です',
'token': 8252,
'token_str': '利 用'},
{'score': 0.04910655692219734,
'sequence': 'nagisa で 作成 できる ヒデル です',
'token': 9559,
'token_str': '作 成'},
{'score': 0.03792576864361763,
'sequence': 'nagisa で θ³Όε…₯ できる ヒデル です',
'token': 9430,
'token_str': 'θ³Ό ε…₯'},
{'score': 0.026893319562077522,
'sequence': 'nagisa で ε…₯手 できる ヒデル です',
'token': 11273,
'token_str': 'ε…₯ 手'}]
```

Tokenization and vectorization.

```python
>>> from transformers import BertModel
>>> from nagisa_bert import NagisaBertTokenizer

>>> text = "nagisaで[MASK]できるヒデルです"
>>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['na', '##g', '##is', '##a', 'で', '[MASK]', 'できる', 'ヒデル', 'です']

>>> model = BertModel.from_pretrained("taishi-i/nagisa_bert")
>>> h = model(**tokenizer(text, return_tensors="pt")).last_hidden_state
>>> print(h)
tensor([[[-0.2912, -0.6818, -0.4097, ..., 0.0262, -0.3845, 0.5816],
[ 0.2504, 0.2143, 0.5809, ..., -0.5428, 1.1805, 1.8701],
[ 0.1890, -0.5816, -0.5469, ..., -1.2081, -0.2341, 1.0215],
...,
[-0.4360, -0.2546, -0.2824, ..., 0.7420, -0.2904, 0.3070],
[-0.6598, -0.7607, 0.0034, ..., 0.2982, 0.5126, 1.1403],
[-0.2505, -0.6574, -0.0523, ..., 0.9082, 0.5851, 1.2625]]],
grad_fn=<NativeLayerNormBackward0>)
```

0 comments on commit 75087e9

Please sign in to comment.