Name		Name	Last commit message	Last commit date
parent directory ..
paddlehub_baseline		paddlehub_baseline
src		src
README.md		README.md
download.sh		download.sh
evaluate.py		evaluate.py
md5.txt		md5.txt
predict.sh		predict.sh
train.sh		train.sh

README.md

DuReader_robust Dataset

Machine Reading Comprehension (MRC) is a crucial and challenging task in natural language processing (NLP). In recent years, with the development of deep learning and the increasing availability of large-scale datasets, MRC has achieved remarkable advancements. Although a number of MRC models obtains human parity performance on several datasets, we find that these models are still far from robust in the scenario of real-world applications.

To comprehensively evaluate the robustness of MRC models, we create a Chinese dataset, namely DuReader_robust. It is designed to challenge MRC models from the following aspects: (1) over-sensitivity, (2) over-stability and (3) generalization. Besides, DuReader_robust has another advantage over previous datasets: questions and documents are from Baidu Search. It presents the robustness issues of MRC models when applying them to real-world applications.

For more details about the dataset, please refer to this paper.

Note that we release two equivalent versions of the baseline for DuReader_robust dataset. One is the DuReader_robust Baseline System and the other is PaddleHub Baseline. The two baseline systems implement the same ERNIE 1.0 based MRC model, while the main differences are

DuReader_robust Baseline System is more flexible and unconstrained to secondary development.
PaddleHub Baseline is easier to get started as PaddleHub integrates some simplified fine-tune api packaged for ERNIE.

DuReader_robust Baseline System

In this repository, we release a baseline system for DuReader_robust dataset. The baseline system is based on ERNIE 1.0, and is implemented with PaddlePaddle framework. To run the baseline system, please follow the instructions below.

Environment Requirements

The baseline system has been tested on

CentOS 6.3
PaddlePaddle 1.6.1
Python 2.7.13 & Python 3.7.3
Cuda 9.0
CuDnn 7.0
NCCL 2.2.13

To install PaddlePaddle, please see the PaddlePaddle Homepage for more information.

Download

Before run the baseline system, please download the DuReader_robust dataset and the pretrained model parameters (ERNIE 1.0 base):

sh download.sh

The dataset will be saved into data/, the pretrained and fine-tuned model parameters will be saved into pretrained_model/ and finetuned_model/, respectively. Note that the fine-tuned model was fine-tuned from the training data based on the pretrained model (ERNIE 1.0).

The descriptions of the data structure can be found in data/README.md.

Run Baseline

Training

To fine-tune a model (on the demo dataset), please run the following command:

sh train.sh --train_file data/demo/demo_train.json --predict_file data/demo/demo_dev.json

This will start the training process on the demo dataset. At the end of training, model parameters and predictions will be saved into output/.

To train the model with user specified arguments (e.g. multi-GPUs, more epochs and larger batch size), please run the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3 sh train.sh --epoch 5 --batch_size 8 --train_file data/demo/demo_train.json --predict_file data/demo/demo_dev.json

You can also directly modify the arguments in train.sh. More arguments can be found in src/run_mrc.py.

Prediction

To predict with fine-tuned parameters, (e.g. on the devlopment set), please run the following command:

CKPT=<your_model_dir> sh predict.sh --predict_file data/dev.json

The model parameters under your_model_dir (e.g. finetuned_model as we have provided) will be loaded for prediction. The predicted answers will be saved into output/.

Evaluation

F1-score and exact match (EM) are used as evaluation metrics. Here we provide a script evaluate.py for evaluation.

To evluate, run

python evaluate.py <reference_file> <prediction_file>

Where <reference_file> is the dataset file (e.g. data/dev.json), and <prediction_file> is the model prediction (e.g. output/dev_predictions.json) that should be a valid JSON file of (qid, answer) pairs, for example:

{
    "question_id_1" : "answer for question 1",
    ...
    "question_id_N": "answer for question N"
}

After runing the evaluation script, you will get the evaluation results with the following format:

{"F1": "80.842", "EM": "69.019", "TOTAL": 1417, "SKIP": 0}

Baseline Performance

The performance of our baseline model (i.e. the fine-tuned model we provided above) are shown below:

Dataset	F1	EM
basic dev	80.84	69.02

PaddleHub Baseline

To get the competition started quickly, we also release the baseline based on PaddleHub (https://github.com/PaddlePaddle/PaddleHub). PaddleHub is an application toolkit for pre-trained models of PaddlePaddle(『飞桨』预训练模型应用工具).

What's your need is to install the latest paddlehub toolkit by the command

pip install --upgrade paddlehub

Then, run

cd paddlehub_baseline
sh paddlehub_reading_comprehension.sh

You will fine-tune the ERNIE pretrained model on the competition dataset and get the final prediction results.

Now, you can submit the results to the competition.

All questions about PaddleHub, you can raise an issue (https://github.com/PaddlePaddle/PaddleHub/issues). We will reply as soon as possible.

The same codes also are released on the demo.

Copyright and License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DuReader-Robust-BASELINE

DuReader-Robust-BASELINE

README.md

DuReader_robust Dataset

DuReader_robust Baseline System

Environment Requirements

Download

Run Baseline

Training

Prediction

Evaluation

Baseline Performance

PaddleHub Baseline

Copyright and License

Files

DuReader-Robust-BASELINE

Directory actions

More options

Directory actions

More options

Latest commit

History

DuReader-Robust-BASELINE

Folders and files

parent directory

README.md

DuReaderrobust Dataset

DuReaderrobust Baseline System

Environment Requirements

Download

Run Baseline

Training

Prediction

Evaluation

Baseline Performance

PaddleHub Baseline

Copyright and License

DuReader_robust Dataset

DuReader_robust Baseline System