Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading (EMNLP 2020)
This repository is the implementation of our EMNLP 2020 Paper Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading.
Discern
achieves new state-of-the-art results on ShARC conversational machine reading benchmark (May 2020).
If you have any question, please open an issue or contact [email protected]
If you find our code useful, please cite our papers as follows:
@article{gao-etal-2020-discern,
title={Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading},
author={Yifan Gao and Chien-Sheng Wu and Jingjing Li and Shafiq Joty and Steven C.H. Hoi, Caiming Xiong and Irwin King and Michael R. Lyu},
journal={EMNLP},
year={2020},
}
@article{gao-etal-2020-explicit,
title={Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading},
author={Yifan Gao and Chien-Sheng Wu and Shafiq R. Joty and Caiming Xiong and Richard Socher and Irwin King and Michael R. Lyu and Steven C. H. Hoi},
journal={ACL},
year={2020},
}
Discourse segmentation environment (
PYT_SEGBOT
)
conda create -n segbot python=3.6
conda install pytorch==0.4.1 -c pytorch
conda install nltk==3.4.5 numpy==1.18.1 pycparser==2.20 six==1.14.0 tqdm==4.44.1
Main environment (
PYT_DISCERN
)
conda create -n discern python=3.6
conda install pytorch==1.0.1 cudatoolkit=10.0 -c pytorch
conda install spacy==2.0.16 scikit-learn
python -m spacy download en_core_web_lg && python -m spacy download en_core_web_md
pip install editdistance==0.5.2 transformers==2.8.0
UniLM question generation environment (
PYT_QG
)
# create conda environment
conda create -n qg python=3.6
conda install pytorch==1.1 cudatoolkit=10.0 -c pytorch
conda install spacy==2.0.16 scikit-learn
python -m spacy download en_core_web_lg && python -m spacy download en_core_web_md
pip install editdistance==0.5.2
# install apex
git clone -q https://github.com/NVIDIA/apex.git
cd apex
git reset --hard 1603407bf49c7fc3da74fceb6a6c7b47fece2ef8
python setup.py install --cuda_ext --cpp_ext
cd ..
# setup unilm
cd qg
pip install --editable .
Download ShARC data
mkdir data
cd data
wget https://sharc-data.github.io/data/sharc1-official.zip -O sharc_raw.zip
unzip sharc_raw.zip
mv sharc1-official/ sharc_raw
Download RoBERTa, UniLM
mkdir pretrained_models
# RoBERTa
mkdir pretrained_models/roberta_base
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-config.json -O pretrained_models/roberta_base/config.json
wget --quiet https://cdn.huggingface.co/roberta-base-merges.txt -O pretrained_models/roberta_base/merges.txt
wget --quiet https://cdn.huggingface.co/roberta-base-pytorch_model.bin -O pretrained_models/roberta_base/pytorch_model.bin
wget --quiet https://cdn.huggingface.co/roberta-base-vocab.json -O pretrained_models/roberta_base/vocab.json
# UniLM & BERT
mkdir pretrained_models/unilm
wget --quiet https://unilm.blob.core.windows.net/ckpt/unilm1-large-cased.bin -O pretrained_models/unilm/unilmv1-large-cased.bin
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt -O pretrained_models/unilm/bert-large-cased-vocab.txt
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz -O pretrained_models/unilm/bert-large-cased.tar.gz
cd pretrained_models/unilm
tar -zxvf bert-large-cased.tar.gz
rm bert-large-cased.tar.gz
You can also download our pretrained models and our dev set predictions:
- Decision Making Model: decision.pt
- Span Extraction Model: span.pt
- Question Generation Model: unilmqg.bin
We would now set up our directories like this:
.
└── model
└── ...
└── segedu
└── ...
└── unilmqg
└── ...
└── README.md
└── data
└── ...
└── pretrained_models
└── unilm
└── ...
└── unilmqg.bin
└── roberta_base
└── ...
└── decision.pt
└── span.pt
We use SegBot and their implementation to segment rules in the ShARC regulation snippets.
cd segedu
PYT_SEGBOT preprocess_discourse_segment.py
PYT_SEGBOT sharc_discourse_segmentation.py
data/train_snippet_parsed.json
and data/dev_snippet_parsed.json
are parsed rules.
We find in some cases, there are some extra/missing spaces in ShARC questions. Here we fix them by merging these questions:
PYT_DISCERN fix_questions.py
preprocess: prepare inputs for RoBERTa, generate labels for entailment supervision
PYT_DISCERN preprocess_decision.py
training
PYT_DISCERN -u train_sharc.py \
--train_batch=16 \
--gradient_accumulation_steps=2 \
--epoch=5 \
--seed=323 \
--learning_rate=5e-5 \
--loss_entail_weight=3.0 \
--dsave="out/{}" \
--model=decision \
--early_stop=dev_0a_combined \
--data=./data/ \
--data_type=decision_roberta_base \
--prefix=train_decision \
--trans_layer=2 \
--eval_every_steps=300 # 516
inference
Here we can directly do interence using our trained model decision.pt
. You can also replace it with your own models by setting --resume=/path/to/your/trained/models
.
PYT_DISCERN train_sharc.py \
--dsave="./out/{}" \
--model=decision \
--data=./data/ \
--data_type=decision_roberta_base \
--prefix=inference_decision \
--resume=./pretrained_models/decision.pt \
--trans_layer=2 \
--test
The prediction file is saved at './out/inference_decision/dev.preds.json'.
Our model achieves the following performance on the development set using our pre-trained model decision.pt
:
Micro Acc. | Macro Acc. |
---|---|
74.85 | 79.79 |
For the follow-up question generation task, we firstly use a span-extraction model to extract the underspecified span within the rule text, then use UniLM to rephrase the span into a well-formed question.
preprocess span extraction
PYT_DISCERN preprocess_span.py
training
PYT_DISCERN -u train_sharc.py \
--train_batch=16 \
--gradient_accumulation_steps=2 \
--epoch=5 \
--seed=115 \
--learning_rate=5e-5 \
--dsave="out/{}" \
--model=span \
--early_stop=dev_0_combined \
--data=./data/ \
--data_type=span_roberta_base \
--prefix=train_span \
--eval_every_steps=100
inference
PYT_DISCERN -u train_sharc.py \
--dsave="out/{}" \
--model=span \
--data=./data/ \
--data_type=span_roberta_base \
--prefix=inference_span \
--resume=./pretrained_models/span.pt \
--test
Our trained model 'span.pt' achieves the following intermediate results:
BLEU 1 | BLEU 4 | Span_F1 |
---|---|---|
50.89 | 44.0 | 62.59 |
We follow Explicit Memory Tracker for question generation. Here we take their trained model and do inference only. Please refer to the Explicit Memory Tracker repo for training details.
The UniLM Question Generation model reads the predicted span from the span extraction model, and rephrases it into the question.
PYT_QG -u qg.py \
--fin=./data/sharc_raw/json/sharc_dev.json \
--fpred=./out/inference_span \ # directory of span prediction
--model_recover_path=/absolute/path/to/pretrained_models/qg.bin \
--cache_path=/absolute/path/to/pretrain_models/unilm/
Oracle question generation evaluation results of our released model unilmqg.bin
(dev. set):
BLEU 1 | BLEU 2 | BLEU 3 | BLEU 4 |
---|---|---|---|
65.73 | 59.43 | 55.43 | 52.43 |