Skip to content

Commit ab737ee

Browse files
LogicZMaksimkaIgnatovFedorvaskonov
authored
feat: conll2003 and ontonotes ner configs (#1691)
Co-authored-by: Fedor Ignatov <[email protected]> Co-authored-by: vasily <[email protected]>
1 parent 9447636 commit ab737ee

File tree

8 files changed

+260
-63
lines changed

8 files changed

+260
-63
lines changed

README.md

Lines changed: 31 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,29 @@
1+
# DeepPavlov 1.0
2+
13
[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
24
![Python 3.6, 3.7, 3.8, 3.9, 3.10, 3.11](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-green.svg)
35
[![Downloads](https://pepy.tech/badge/deeppavlov)](https://pepy.tech/project/deeppavlov)
4-
<img align="right" height="27%" width="27%" src="docs/_static/deeppavlov_logo.png"/>
6+
[![Static Badge](https://img.shields.io/badge/DeepPavlov%20Community-blue)](https://forum.deeppavlov.ai/)
7+
[![Static Badge](https://img.shields.io/badge/DeepPavlov%20Demo-blue)](https://demo.deeppavlov.ai/)
58

6-
DeepPavlov is an open-source conversational AI library built on [PyTorch](https://pytorch.org/).
79

8-
DeepPavlov is designed for
9-
* development of production ready chat-bots and complex conversational systems,
10-
* research in the area of NLP and, particularly, of dialog systems.
10+
DeepPavlov 1.0 is an open-source NLP framework built on [PyTorch](https://pytorch.org/) and [transformers](https://github.com/huggingface/transformers). DeepPavlov 1.0 is created for modular and configuration-driven development of state-of-the-art NLP models and supports a wide range of NLP model applications. DeepPavlov 1.0 is designed for practitioners with limited knowledge of NLP/ML.
1111

1212
## Quick Links
1313

14-
* Demo [*demo.deeppavlov.ai*](https://demo.deeppavlov.ai/)
15-
* Documentation [*docs.deeppavlov.ai*](http://docs.deeppavlov.ai/)
16-
* Model List [*docs:features/*](http://docs.deeppavlov.ai/en/master/features/overview.html)
17-
* Contribution Guide [*docs:contribution_guide/*](http://docs.deeppavlov.ai/en/master/devguides/contribution_guide.html)
18-
* Issues [*github/issues/*](https://github.com/deeppavlov/DeepPavlov/issues)
19-
* Forum [*forum.deeppavlov.ai*](https://forum.deeppavlov.ai/)
20-
* Blogs [*medium.com/deeppavlov*](https://medium.com/deeppavlov)
21-
* [Extended colab tutorials](https://github.com/deeppavlov/dp_tutorials)
22-
* Docker Hub [*hub.docker.com/u/deeppavlov/*](https://hub.docker.com/u/deeppavlov/)
23-
* Docker Images Documentation [*docs:docker-images/*](http://docs.deeppavlov.ai/en/master/intro/installation.html#docker-images)
24-
25-
Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we can improve the DeepPavlov framework.
26-
27-
**Models**
28-
29-
[Named Entity Recognition](http://docs.deeppavlov.ai/en/master/features/models/NER.html) | [Intent/Sentence Classification](http://docs.deeppavlov.ai/en/master/features/models/classification.html) |
30-
31-
[Question Answering over Text (SQuAD)](http://docs.deeppavlov.ai/en/master/features/models/SQuAD.html) | [Knowledge Base Question Answering](http://docs.deeppavlov.ai/en/master/features/models/KBQA.html)
32-
33-
[Sentence Similarity/Ranking](http://docs.deeppavlov.ai/en/master/features/models/neural_ranking.html) | [TF-IDF Ranking](http://docs.deeppavlov.ai/en/master/features/models/tfidf_ranking.html)
34-
35-
[Syntactic Parsing](http://docs.deeppavlov.ai/en/master/features/models/syntax_parser.html) | [Morphological Tagging](http://docs.deeppavlov.ai/en/master/features/models/morpho_tagger.html)
36-
37-
[Automatic Spelling Correction](http://docs.deeppavlov.ai/en/master/features/models/spelling_correction.html) | [Entity Extraction](http://docs.deeppavlov.ai/en/master/features/models/entity_extraction.html)
38-
39-
[Open Domain Questions Answering](http://docs.deeppavlov.ai/en/master/features/models/ODQA.html) | [Russian SuperGLUE](http://docs.deeppavlov.ai/en/master/features/models/superglue.html)
40-
41-
[Relation Extraction](http://docs.deeppavlov.ai/en/master/features/models/relation_extraction.html)
42-
43-
**Embeddings**
44-
45-
[BERT embeddings for the Russian, Polish, Bulgarian, Czech, and informal English](http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#bert)
14+
|name|Description|
15+
|--|--|
16+
| ⭐️ [*Demo*](https://demo.deeppavlov.ai/)|Check out our NLP models in the online demo|
17+
| 📚 [*Documentation*](http://docs.deeppavlov.ai/)|How to use DeepPavlov 1.0 and its features|
18+
| 🚀 [*Model List*](http://docs.deeppavlov.ai/en/master/features/overview.html)|Find the NLP model you need in the list of available models|
19+
| 🪐 [*Contribution Guide*](http://docs.deeppavlov.ai/en/master/devguides/contribution_guide.html)|Please read the contribution guidelines before making a contribution|
20+
| 🎛 [*Issues*](https://github.com/deeppavlov/DeepPavlov/issues)|If you have an issue with DeepPavlov, please let us know|
21+
|[*Forum*](https://forum.deeppavlov.ai/)|Please let us know if you have a problem with DeepPavlov|
22+
| 📦 [*Blogs*](https://medium.com/deeppavlov)|Read about our current development|
23+
| 🦙 [Extended colab tutorials](https://github.com/deeppavlov/dp_tutorials)|Check out the code tutorials for our models|
24+
| 🌌 [*Docker Hub*](https://hub.docker.com/u/deeppavlov/)|Check out the Docker images for rapid deployment|
25+
| 👩‍🏫 [*Feedback*](https://forms.gle/i64fowQmiVhMMC7f9)|Please leave us your feedback to make DeepPavlov better|
4626

47-
[ELMo embeddings for the Russian language](http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#elmo)
48-
49-
[FastText embeddings for the Russian language](http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#fasttext)
50-
51-
**Auto ML**
52-
53-
[Tuning Models](http://docs.deeppavlov.ai/en/master/features/hypersearch.html)
54-
55-
**Integrations**
56-
57-
[REST API](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html) | [Socket API](http://docs.deeppavlov.ai/en/master/integrations/socket_api.html)
58-
59-
[Amazon AWS](http://docs.deeppavlov.ai/en/master/integrations/aws_ec2.html)
6027

6128
## Installation
6229

@@ -65,11 +32,14 @@ Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we c
6532

6633
1. Create and activate a virtual environment:
6734
* `Linux`
35+
6836
```
6937
python -m venv env
7038
source ./env/bin/activate
7139
```
40+
7241
2. Install the package inside the environment:
42+
7343
```
7444
pip install deeppavlov
7545
```
@@ -122,7 +92,7 @@ Dataset will be downloaded regardless of whether there was `-d` flag or not.
12292

12393
To train on your own data you need to modify dataset reader path in the
12494
[train config doc](http://docs.deeppavlov.ai/en/master/intro/config_description.html#train-config).
125-
The data format is specified in the corresponding model doc page.
95+
The data format is specified in the corresponding model doc page.
12696

12797
There are even more actions you can perform with configs:
12898

@@ -131,20 +101,19 @@ python -m deeppavlov <action> <config_path> [-d] [-i]
131101
```
132102

133103
* `<action>` can be
134-
* `install` to install model requirements (same as `-i`),
135-
* `download` to download model's data (same as `-d`),
136-
* `train` to train the model on the data specified in the config file,
137-
* `evaluate` to calculate metrics on the same dataset,
138-
* `interact` to interact via CLI,
139-
* `riseapi` to run a REST API server (see
104+
* `install` to install model requirements (same as `-i`),
105+
* `download` to download model's data (same as `-d`),
106+
* `train` to train the model on the data specified in the config file,
107+
* `evaluate` to calculate metrics on the same dataset,
108+
* `interact` to interact via CLI,
109+
* `riseapi` to run a REST API server (see
140110
[doc](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html)),
141-
* `predict` to get prediction for samples from *stdin* or from
111+
* `predict` to get prediction for samples from *stdin* or from
142112
*<file_path>* if `-f <file_path>` is specified.
143113
* `<config_path>` specifies path (or name) of model's config file
144114
* `-d` downloads required data
145115
* `-i` installs model requirements
146116

147-
148117
### Python
149118

150119
To get predictions from a model interactively through Python, run
@@ -157,7 +126,9 @@ model = build_model(<config_path>, install=True, download=True)
157126
# get predictions for 'input_text1', 'input_text2'
158127
model(['input_text1', 'input_text2'])
159128
```
129+
160130
where
131+
161132
* `install=True` installs model requirements (optional),
162133
* `download=True` downloads required data from web - pretrained model files and embeddings (optional),
163134
* `<config_path>` is model name (e.g. `'ner_ontonotes_bert_mult'`), path to the chosen model's config file (e.g.
@@ -174,7 +145,7 @@ model = train_model(<config_path>, install=True, download=True)
174145

175146
To train on your own data you need to modify dataset reader path in the
176147
[train config doc](http://docs.deeppavlov.ai/en/master/intro/config_description.html#train-config).
177-
The data format is specified in the corresponding model doc page.
148+
The data format is specified in the corresponding model doc page.
178149

179150
You can also calculate metrics on the dataset specified in your config file:
180151

deeppavlov/_meta.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = '1.6.0'
1+
__version__ = '1.7.0'
22
__author__ = 'Neural Networks and Deep Learning lab, MIPT'
33
__description__ = 'An open source library for building end-to-end dialog systems and training chatbots.'
44
__keywords__ = ['NLP', 'NER', 'SQUAD', 'Intents', 'Chatbot']
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
{
2+
"dataset_reader": {
3+
"class_name": "conll2003_reader",
4+
"data_path": "{DOWNLOADS_PATH}/conll2003/",
5+
"dataset_name": "conll2003",
6+
"provide_pos": false
7+
},
8+
"dataset_iterator": {
9+
"class_name": "data_learning_iterator"
10+
},
11+
"chainer": {
12+
"in": [
13+
"x"
14+
],
15+
"in_y": [
16+
"y"
17+
],
18+
"pipe": [
19+
{
20+
"class_name": "torch_transformers_ner_preprocessor",
21+
"vocab_file": "{TRANSFORMER}",
22+
"do_lower_case": false,
23+
"max_seq_length": 512,
24+
"max_subword_length": 15,
25+
"token_masking_prob": 0.0,
26+
"in": [
27+
"x"
28+
],
29+
"out": [
30+
"x_tokens",
31+
"x_subword_tokens",
32+
"x_subword_tok_ids",
33+
"startofword_markers",
34+
"attention_mask",
35+
"tokens_offsets"
36+
]
37+
},
38+
{
39+
"id": "tag_vocab",
40+
"class_name": "simple_vocab",
41+
"unk_token": [
42+
"O"
43+
],
44+
"pad_with_zeros": true,
45+
"save_path": "{MODEL_PATH}/tag.dict",
46+
"load_path": "{MODEL_PATH}/tag.dict",
47+
"fit_on": [
48+
"y"
49+
],
50+
"in": [
51+
"y"
52+
],
53+
"out": [
54+
"y_ind"
55+
]
56+
},
57+
{
58+
"class_name": "torch_transformers_sequence_tagger",
59+
"n_tags": "#tag_vocab.len",
60+
"pretrained_bert": "{TRANSFORMER}",
61+
"attention_probs_keep_prob": 0.5,
62+
"use_crf": true,
63+
"encoder_layer_ids": [
64+
-1
65+
],
66+
"save_path": "{MODEL_PATH}/model",
67+
"load_path": "{MODEL_PATH}/model",
68+
"in": [
69+
"x_subword_tok_ids",
70+
"attention_mask",
71+
"startofword_markers"
72+
],
73+
"in_y": [
74+
"y_ind"
75+
],
76+
"out": [
77+
"y_pred_ind",
78+
"probas"
79+
]
80+
},
81+
{
82+
"ref": "tag_vocab",
83+
"in": [
84+
"y_pred_ind"
85+
],
86+
"out": [
87+
"y_pred"
88+
]
89+
}
90+
],
91+
"out": [
92+
"x_tokens",
93+
"y_pred"
94+
]
95+
},
96+
"train": {
97+
"metrics": [
98+
{
99+
"name": "ner_f1",
100+
"inputs": [
101+
"y",
102+
"y_pred"
103+
]
104+
},
105+
{
106+
"name": "ner_token_f1",
107+
"inputs": [
108+
"y",
109+
"y_pred"
110+
]
111+
}
112+
],
113+
"evaluation_targets": [
114+
"valid",
115+
"test"
116+
],
117+
"class_name": "torch_trainer"
118+
},
119+
"metadata": {
120+
"variables": {
121+
"ROOT_PATH": "~/.deeppavlov",
122+
"DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
123+
"MODELS_PATH": "{ROOT_PATH}/models",
124+
"TRANSFORMER": "microsoft/deberta-v3-base",
125+
"MODEL_PATH": "{MODELS_PATH}/ner_conll2003_deberta_crf"
126+
},
127+
"download": [
128+
{
129+
"url": "http://files.deeppavlov.ai/v1/ner/ner_conll2003_deberta_crf.tar.gz",
130+
"subdir": "{MODEL_PATH}"
131+
}
132+
]
133+
}
134+
}
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
{
2+
"dataset_reader": {
3+
"class_name": "conll2003_reader",
4+
"data_path": "{DOWNLOADS_PATH}/ontonotes/",
5+
"dataset_name": "ontonotes",
6+
"provide_pos": false
7+
},
8+
"dataset_iterator": {
9+
"class_name": "data_learning_iterator"
10+
},
11+
"chainer": {
12+
"in": ["x"],
13+
"in_y": ["y"],
14+
"pipe": [
15+
{
16+
"class_name": "torch_transformers_ner_preprocessor",
17+
"vocab_file": "{TRANSFORMER}",
18+
"do_lower_case": false,
19+
"max_seq_length": 512,
20+
"max_subword_length": 15,
21+
"token_masking_prob": 0.0,
22+
"in": ["x"],
23+
"out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"]
24+
},
25+
{
26+
"id": "tag_vocab",
27+
"class_name": "simple_vocab",
28+
"unk_token": ["O"],
29+
"pad_with_zeros": true,
30+
"save_path": "{MODEL_PATH}/tag.dict",
31+
"load_path": "{MODEL_PATH}/tag.dict",
32+
"fit_on": ["y"],
33+
"in": ["y"],
34+
"out": ["y_ind"]
35+
},
36+
{
37+
"class_name": "torch_transformers_sequence_tagger",
38+
"n_tags": "#tag_vocab.len",
39+
"pretrained_bert": "{TRANSFORMER}",
40+
"attention_probs_keep_prob": 0.5,
41+
"use_crf": true,
42+
"encoder_layer_ids": [-1],
43+
"save_path": "{MODEL_PATH}/model",
44+
"load_path": "{MODEL_PATH}/model",
45+
"in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"],
46+
"in_y": ["y_ind"],
47+
"out": ["y_pred_ind", "probas"]
48+
},
49+
{
50+
"ref": "tag_vocab",
51+
"in": ["y_pred_ind"],
52+
"out": ["y_pred"]
53+
}
54+
],
55+
"out": ["x_tokens", "y_pred"]
56+
},
57+
"train": {
58+
"metrics": [
59+
{
60+
"name": "ner_f1",
61+
"inputs": ["y", "y_pred"]
62+
},
63+
{
64+
"name": "ner_token_f1",
65+
"inputs": ["y", "y_pred"]
66+
}
67+
],
68+
"evaluation_targets": ["valid", "test"],
69+
"class_name": "torch_trainer"
70+
},
71+
"metadata": {
72+
"variables": {
73+
"ROOT_PATH": "~/.deeppavlov",
74+
"DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
75+
"MODELS_PATH": "{ROOT_PATH}/models",
76+
"TRANSFORMER": "microsoft/deberta-v3-base",
77+
"MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_deberta_crf"
78+
},
79+
"download": [
80+
{
81+
"url": "http://files.deeppavlov.ai/v1/ner/ner_ontonotes_deberta_crf.tar.gz",
82+
"subdir": "{MODEL_PATH}"
83+
}
84+
]
85+
}
86+
}

deeppavlov/core/common/requirements_registry.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,9 @@
148148
],
149149
"torch_transformers_ner_preprocessor": [
150150
"{DEEPPAVLOV_PATH}/requirements/pytorch.txt",
151-
"{DEEPPAVLOV_PATH}/requirements/transformers.txt"
151+
"{DEEPPAVLOV_PATH}/requirements/transformers.txt",
152+
"{DEEPPAVLOV_PATH}/requirements/sentencepiece.txt",
153+
"{DEEPPAVLOV_PATH}/requirements/protobuf.txt"
152154
],
153155
"torch_transformers_nll_ranker": [
154156
"{DEEPPAVLOV_PATH}/requirements/pytorch.txt",

deeppavlov/requirements/protobuf.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
protobuf<=3.20
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
sentencepiece==0.2.0

0 commit comments

Comments
 (0)