Skip to content

Commit 1c72188

Browse files
authored
Release 0.0.4
Release 0.0.4
2 parents 6aca37b + 9c45a4e commit 1c72188

File tree

145 files changed

+10466
-2717
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

145 files changed

+10466
-2717
lines changed

.travis.yml

Lines changed: 0 additions & 17 deletions
This file was deleted.

Jenkinsfile

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
node('gpu') {
2+
try {
3+
stage('Clean') {
4+
sh "rm -rf .[^.] .??* *"
5+
}
6+
stage('Checkout') {
7+
sh "cp -r ${pwd()}@script/* ."
8+
}
9+
stage('Setup') {
10+
env.CUDA_VISIBLE_DEVICES=0
11+
sh """
12+
virtualenv --python=python3 ".venv-$BUILD_NUMBER"
13+
. .venv-$BUILD_NUMBER/bin/activate
14+
sed -ri 's/^ *tensorflow *(=|<|>|\$)/tensorflow-gpu\\1/g' requirements.txt
15+
sed -i "s/stream=True/stream=False/g" deeppavlov/core/data/utils.py
16+
python setup.py develop
17+
pip install http://lnsigo.mipt.ru/export/en_core_web_sm-2.0.0.tar.gz
18+
python -m spacy link en_core_web_sm en --force
19+
pip install -r requirements-dev.txt
20+
"""
21+
}
22+
stage('Tests') {
23+
sh """
24+
. .venv-$BUILD_NUMBER/bin/activate
25+
pytest -v
26+
"""
27+
}
28+
} catch (e) {
29+
emailext to: '${DEFAULT_RECIPIENTS}',
30+
subject: '${PROJECT_NAME} - Build # ${BUILD_NUMBER} - FAILED!',
31+
body: '${BRANCH_NAME} - ${BUILD_URL}',
32+
attachLog: true
33+
throw e
34+
}
35+
emailext to: '${DEFAULT_RECIPIENTS}',
36+
subject: '${PROJECT_NAME} - Build # ${BUILD_NUMBER} - ${BUILD_STATUS}!',
37+
body: '${BRANCH_NAME} - ${BUILD_URL}',
38+
attachLog: true
39+
}

MANIFEST.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
include README.MD
2+
include LICENSE
3+
include requirements.txt
4+
recursive-include deeppavlov/configs *.json
5+
recursive-include utils *.json

README.md

Lines changed: 70 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ Our goal is to enable AI-application developers and researchers with:
1515
* a framework for implementing and testing their own dialog models
1616
* tools for application integration with adjacent infrastructure (messengers, helpdesk software etc.)
1717
* benchmarking environment for conversational models and uniform access to relevant datasets
18-
18+
1919
## Demo
2020

21-
Demo of selected features is available at [demo.ipavlov.ai](http://demo.ipavlov.ai/)
21+
Demo of selected features is available at [demo.ipavlov.ai](https://demo.ipavlov.ai/)
2222

2323
## Features
2424

@@ -33,6 +33,7 @@ Demo of selected features is available at [demo.ipavlov.ai](http://demo.ipavlov.
3333
| **Skills** | |
3434
| [Goal-oriented bot](deeppavlov/skills/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand. |
3535
| [Seq2seq goal-oriented bot](deeppavlov/skills/seq2seq_go_bot/README.md) | Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. |
36+
|[ODQA](deeppavlov/skills/odqa/README.md) | An open domain question answering skill. The skill accepts free-form questions about the world and outputs an answer based on its Wikipedia knowledge.|
3637
| **Embeddings** | |
3738
| [Pre-trained embeddings for the Russian language](pretrained-vectors.md) | Word vectors for the Russian language trained on joint [Russian Wikipedia](https://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0) and [Lenta.ru](https://lenta.ru/) corpora. |
3839

@@ -44,27 +45,31 @@ View video demo of deployment of a goal-oriented bot and a slot-filling model wi
4445

4546
* Run goal-oriented bot with Telegram interface:
4647
```
47-
python -m deeppavlov.deep interactbot deeppavlov/configs/go_bot/gobot_dstc2.json -t <TELEGRAM_TOKEN>
48+
python -m deeppavlov.deep interactbot deeppavlov/configs/go_bot/gobot_dstc2.json -d -t <TELEGRAM_TOKEN>
4849
```
4950
* Run goal-oriented bot with console interface:
5051
```
51-
python -m deeppavlov.deep interact deeppavlov/configs/go_bot/gobot_dstc2.json
52+
python -m deeppavlov.deep interact deeppavlov/configs/go_bot/gobot_dstc2.json -d
5253
```
5354
* Run goal-oriented bot with REST API:
5455
```
55-
python -m deeppavlov.deep riseapi deeppavlov/configs/go_bot/gobot_dstc2.json
56+
python -m deeppavlov.deep riseapi deeppavlov/configs/go_bot/gobot_dstc2.json -d
5657
```
5758
* Run slot-filling model with Telegram interface:
5859
```
59-
python -m deeppavlov.deep interactbot deeppavlov/configs/ner/slotfill_dstc2.json -t <TELEGRAM_TOKEN>
60+
python -m deeppavlov.deep interactbot deeppavlov/configs/ner/slotfill_dstc2.json -d -t <TELEGRAM_TOKEN>
6061
```
6162
* Run slot-filling model with console interface:
6263
```
63-
python -m deeppavlov.deep interact deeppavlov/configs/ner/slotfill_dstc2.json
64+
python -m deeppavlov.deep interact deeppavlov/configs/ner/slotfill_dstc2.json -d
6465
```
6566
* Run slot-filling model with REST API:
6667
```
67-
python -m deeppavlov.deep riseapi deeppavlov/configs/ner/slotfill_dstc2.json
68+
python -m deeppavlov.deep riseapi deeppavlov/configs/ner/slotfill_dstc2.json -d
69+
```
70+
* Predict intents on every line in a file:
71+
```
72+
python -m deeppavlov.deep predict deeppavlov/configs/intents/intents_snips.json -d --batch-size 15 < /data/in.txt > /data/out.txt
6873
```
6974
## Conceptual overview
7075

@@ -142,37 +147,47 @@ DeepPavlov is built on top of machine learning frameworks [TensorFlow](https://w
142147
143148
To use our pre-trained models, you should first download them:
144149
```
145-
python -m deeppavlov.download [-all]
150+
python -m deeppavlov.deep download <path_to_config>
146151
```
147-
* running this command without options will download basic examples, `[-all]` option will download **all** our pre-trained models.
148-
* Warning! `[-all]` requires about 10 GB of free space on disk.
149-
152+
or you can use additional key `-d` to automatically download all required models and data with any command like `interact`, `riseapi`, etc.
153+
150154
Then you can interact with the models or train them with the following command:
151155
152156
```
153-
python -m deeppavlov.deep <mode> <path_to_config>
157+
python -m deeppavlov.deep <mode> <path_to_config> [-d]
154158
```
155159
156-
* `<mode>` can be 'train', 'interact', 'interactbot' or 'riseapi'
157-
* `<path_to_config>` should be a path to an NLP pipeline json config
160+
* `<mode>` can be 'train', 'predict', 'interact', 'interactbot' or 'riseapi'
161+
* `<path_to_config>` should be a path to an NLP pipeline json config (e.g. `deeppavlov/configs/ner/slotfill_dstc2.json`)
162+
or a name without the `.json` extension of one of the config files [provided](deeppavlov/configs) in this repository (e.g. `slotfill_dstc2`)
158163
159-
For 'interactbot' mode you should specify Telegram bot token in `-t` parameter or in `TELEGRAM_TOKEN` environment variable.
164+
For the 'interactbot' mode you should specify Telegram bot token in `-t` parameter or in `TELEGRAM_TOKEN` environment variable. Also if you want to get custom `/start` and `/help` Telegram messages for the running model you should:
165+
* Add section to `utils/telegram_utils/model_info.json` with your custom Telegram messages
166+
* In model config file specify `metadata.labels.telegram_utils` parameter with name which refers to the added section of `utils/telegram_utils/model_info.json`
160167
161168
For 'riseapi' mode you should specify api settings (host, port, etc.) in [*utils/server_utils/server_config.json*](utils/server_utils/server_config.json) configuration file. If provided, values from *model_defaults* section override values for the same parameters from *common_defaults* section. Model names in *model_defaults* section should be similar to the class names of the models main component.
162169
170+
For 'predict' you can specify path to input file with `-f` or `--input-file` parameter, otherwise, data will be taken
171+
from stdin.
172+
Every line of input text will be used as a pipeline input parameter, so one example will consist of as many lines,
173+
as many input parameters your pipeline expects.
174+
You can also specify batch size with `-b` or `--batch-size` parameter.
175+
163176
Available model configs are:
164177
165178
- ```deeppavlov/configs/go_bot/*.json```
166179
167180
- ```deeppavlov/configs/seq2seq_go_bot/*.json```
168181
182+
- ```deeppavlov/configs/odqa/*.json```
183+
169184
- ```deeppavlov/configs/squad/*.json```
170185
171186
- ```deeppavlov/configs/intents/*.json```
172187
173188
- ```deeppavlov/configs/ner/*.json```
174189
175-
- ```deeppavlov/configs/rankinf/*.json```
190+
- ```deeppavlov/configs/ranking/*.json```
176191
177192
- ```deeppavlov/configs/error_model/*.json```
178193
@@ -251,7 +266,7 @@ Chainer is a core concept of DeepPavlov library: chainer builds a pipeline from
251266
its inputs and outputs as arrays of names, for example: `"in": ["tokens", "features"]` and `"out": ["token_embeddings", "features_embeddings"]` and you can chain outputs of one components with inputs of other components:
252267
```json
253268
{
254-
"name": "str_lower",
269+
"class": "deeppavlov.models.preproccessors.str_lower:StrLower",
255270
"in": ["x"],
256271
"out": ["x_lower"]
257272
},
@@ -261,8 +276,10 @@ its inputs and outputs as arrays of names, for example: `"in": ["tokens", "featu
261276
"out": ["x_tokens"]
262277
},
263278
```
264-
Each [Component](deeppavlov/core/models/component.py) in the pipeline must implement method `__call__` and has `name` parameter, which is its registered codename. It can also have any other parameters which repeat its `__init__()` method arguments.
265-
Default values of `__init__()` arguments will be overridden with the config values during the initialization of a class instance.
279+
Each [Component](deeppavlov/core/models/component.py) in the pipeline must implement method `__call__` and has `name` parameter, which is its registered codename,
280+
or `class` parameter in the form of `module_name:ClassName`.
281+
It can also have any other parameters which repeat its `__init__()` method arguments.
282+
Default values of `__init__()` arguments will be overridden with the config values during the initialization of a class instance.
266283

267284
You can reuse components in the pipeline to process different parts of data with the help of `id` and `ref` parameters:
268285
```json
@@ -278,7 +295,7 @@ You can reuse components in the pipeline to process different parts of data with
278295
"out": ["y_tokens"]
279296
},
280297
```
281-
298+
282299
### Training
283300

284301
There are two abstract classes for trainable components: **Estimator** and **NNModel**.
@@ -383,6 +400,38 @@ A particular format of returned data should be defined in `__call__()`.
383400

384401
Inference is triggered by `deeppavlov.core.commands.infer.interact_model()` function. There is no need in a separate JSON for inference.
385402

403+
### Rest API
404+
405+
Each library component or skill can be easily made available for inference as a REST web service. The general method is:
406+
407+
`python -m deeppavlov.deep riseapi <config_path> [-d]`
408+
409+
(optional `-d` key is for dependencies download before service start)
410+
411+
Web service properties (host, port, model endpoint, GET request arguments) are provided in `utils/server_utils/server_config.json`.
412+
Properties from `common_defaults` section are used by default unless they are overridden by component-specific properties, provided in `model_defaults` section of the `server_config.json`.
413+
Component-specific properties are bound to the component by `server_utils` label in `metadata/labels` section of the component config. Value of `server_utils` label from component config should match with properties key from `model_defaults` section of `server_config.json`.
414+
415+
For example, `metadata/labels/server_utils` tag from `go_bot/gobot_dstc2.json` references to the *GoalOrientedBot* section of `server_config.json`. Therefore, `model_endpoint` parameter in `common_defaults` will be will be overridden with the same parameter from `model_defaults/GoalOrientedBot`.
416+
417+
Model argument names are provided as list in `model_args_names` parameter, where arguments order corresponds to component API.
418+
When inferencing model via REST api, JSON payload keys should match component arguments names from `model_args_names`.
419+
Default argument name for one argument components is *"context"*.
420+
Here are POST requests examples for some of the library components:
421+
422+
| Component | POST request JSON payload example |
423+
| --------- | -------------------- |
424+
| **One argument components** |
425+
| NER component | {"context":"Elon Musk launched his cherry Tesla roadster to the Mars orbit"} |
426+
| Intent classification component | {"context":"I would like to go to a restaurant with Asian cuisine this evening"} |
427+
| Automatic spelling correction component | {"context":"errror"} |
428+
| Ranking component | {"context":"What is the average cost of life insurance services?"} |
429+
| (Seq2seq) Goal-oriented bot | {"context":"Hello, can you help me to find and book a restaurant this evening?"} |
430+
| **Two arguments components** |
431+
| Question Answering component | {"context":"After 1765, growing philosophical and political differences strained the relationship between Great Britain and its colonies.", "question":"What strained the relationship between Great Britain and its colonies?"} |
432+
433+
Flasgger UI for API testing is provided on `<host>:<port>/apidocs` when running a component in `riseapi` mode.
434+
386435
## License
387436

388437
DeepPavlov is Apache 2.0 - licensed.

deeppavlov/__init__.py

Lines changed: 41 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,27 @@
1+
"""
2+
Copyright 2017 Neural Networks and Deep Learning lab, MIPT
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
"""
16+
117
# check version
218
import sys
319
assert sys.hexversion >= 0x3060000, 'Does not work in python3.5 or lower'
420

5-
621
import deeppavlov.core.models.keras_model
7-
import deeppavlov.core.data.dataset_iterator
822
import deeppavlov.core.data.vocab
23+
import deeppavlov.core.data.simple_vocab
24+
import deeppavlov.core.data.sqlite_database
925
import deeppavlov.dataset_readers.babi_reader
1026
import deeppavlov.dataset_readers.dstc2_reader
1127
import deeppavlov.dataset_readers.kvret_reader
@@ -20,37 +36,57 @@
2036
import deeppavlov.dataset_iterators.typos_iterator
2137
import deeppavlov.dataset_iterators.basic_classification_iterator
2238
import deeppavlov.dataset_iterators.squad_iterator
39+
import deeppavlov.dataset_iterators.sqlite_iterator
2340
import deeppavlov.models.classifiers.intents.intent_model
2441
import deeppavlov.models.commutators.random_commutator
2542
import deeppavlov.models.embedders.fasttext_embedder
2643
import deeppavlov.models.embedders.dict_embedder
2744
import deeppavlov.models.embedders.glove_embedder
28-
import deeppavlov.models.encoders.bow
29-
import deeppavlov.models.ner.slotfill
45+
import deeppavlov.models.embedders.bow_embedder
46+
import deeppavlov.models.ner.ner_ontonotes
3047
import deeppavlov.models.spellers.error_model.error_model
3148
import deeppavlov.models.trackers.hcn_at
3249
import deeppavlov.models.trackers.hcn_et
3350
import deeppavlov.models.preprocessors.str_lower
3451
import deeppavlov.models.preprocessors.squad_preprocessor
35-
import deeppavlov.models.ner.ner
3652
import deeppavlov.models.tokenizers.spacy_tokenizer
3753
import deeppavlov.models.tokenizers.split_tokenizer
54+
import deeppavlov.models.tokenizers.ru_tokenizer
3855
import deeppavlov.models.squad.squad
3956
import deeppavlov.skills.go_bot.bot
4057
import deeppavlov.skills.go_bot.network
4158
import deeppavlov.skills.go_bot.tracker
4259
import deeppavlov.skills.seq2seq_go_bot.bot
4360
import deeppavlov.skills.seq2seq_go_bot.network
4461
import deeppavlov.skills.seq2seq_go_bot.kb
62+
import deeppavlov.skills.odqa.ranker
4563
import deeppavlov.vocabs.typos
64+
import deeppavlov.vocabs.wiki_sqlite
4665
import deeppavlov.dataset_readers.insurance_reader
4766
import deeppavlov.dataset_iterators.ranking_iterator
67+
import deeppavlov.models.ner.network
4868
import deeppavlov.models.ranking.ranking_model
4969
import deeppavlov.models.ranking.metrics
70+
import deeppavlov.models.preprocessors.char_splitter
71+
import deeppavlov.models.preprocessors.mask
72+
import deeppavlov.models.preprocessors.assemble_embeddins_matrix
73+
import deeppavlov.models.preprocessors.capitalization
74+
import deeppavlov.models.preprocessors.field_getter
75+
import deeppavlov.models.preprocessors.sanitizer
76+
import deeppavlov.models.preprocessors.lazy_tokenizer
77+
import deeppavlov.models.slotfill.slotfill_raw
78+
import deeppavlov.models.slotfill.slotfill
79+
import deeppavlov.models.preprocessors.one_hotter
80+
import deeppavlov.dataset_readers.ontonotes_reader
81+
5082

5183
import deeppavlov.metrics.accuracy
5284
import deeppavlov.metrics.fmeasure
5385
import deeppavlov.metrics.bleu
5486
import deeppavlov.metrics.squad_metrics
87+
import deeppavlov.metrics.roc_auc_score
88+
import deeppavlov.metrics.fmeasure_classification
5589

5690
import deeppavlov.core.common.log
91+
92+
import deeppavlov.download

deeppavlov/configs/error_model/brillmoore_kartaslov_ru.json

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,27 @@
3737
"name": "russian_words_vocab"
3838
},
3939
"save_path": "error_model/error_model_ru.tsv",
40-
"load_path": "error_model/error_model_ru.tsv"
40+
"load_path": "error_model/error_model_ru.tsv",
41+
"lm_file": "language_models/ru_wiyalen_no_punkt.arpa.binary"
4142
}
4243
],
4344
"out": ["y_predicted"]
4445
},
4546
"train": {
4647
"validate_best": false,
4748
"test_best": true
49+
},
50+
"metadata": {
51+
"labels": {
52+
"telegram_utils": "ErrorModel",
53+
"server_utils": "ErrorModel"
54+
},
55+
"download": [
56+
"http://lnsigo.mipt.ru/export/deeppavlov_data/error_model.tar.gz",
57+
{
58+
"url": "http://lnsigo.mipt.ru/export/lang_models/ru_wiyalen_no_punkt.arpa.binary.gz",
59+
"subdir": "language_models"
60+
}
61+
]
4862
}
4963
}

deeppavlov/configs/error_model/brillmoore_kartaslov_ru_custom_vocab.json

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,26 @@
4040
},
4141
"save_path": "error_model/error_model_ru.tsv",
4242
"load_path": "error_model/error_model_ru.tsv",
43-
"lm_file": "wiyalen_no_punkt.arpa.binary"
43+
"lm_file": "language_models/ru_wiyalen_no_punkt.arpa.binary"
4444
}
4545
],
4646
"out": ["y_predicted"]
4747
},
4848
"train": {
4949
"validate_best": false,
5050
"test_best": true
51+
},
52+
"metadata": {
53+
"labels": {
54+
"telegram_utils": "ErrorModel",
55+
"server_utils": "ErrorModel"
56+
},
57+
"download": [
58+
"http://lnsigo.mipt.ru/export/deeppavlov_data/error_model.tar.gz",
59+
{
60+
"url": "http://lnsigo.mipt.ru/export/lang_models/ru_wiyalen_no_punkt.arpa.binary.gz",
61+
"subdir": "language_models"
62+
}
63+
]
5164
}
5265
}

0 commit comments

Comments
 (0)