We put together a script, data, and trained models used in our paper. In a nutshell, TANDA is a technique for fine-tuning pre-trained Transformer models sequentially in two steps:
- first, transfer a pre-trained model to a model for a general task by fine-tuning it on a large and high-quality dataset;
- then, perform a second fine-tuning step to adapt the transferred model to the target domain.
We base our implementation on the transformers package. We use the following script to enable sequential fine-tuning
option for the package.
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout f3386 -b tanda-sequential-finetuning
git apply tanda-sequential-finetuning-with-asnq.diff
f3386
is the latest commit as ofSun Nov 17 18:08:51 2019 +0900
, andtanda-sequential-finetuning-with-asnq.diff
is the diff to enable the option.
For example, to transfer with ASNQ and adapt with a target dataset:
- download the ASNQ dataset and the target dataset (e.g. Wiki-QA, formatted similar as ASNQ), and
- run the following script
python run_glue.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--task_name ASNQ \
--do_train \
--do_eval \
--do_lower_case \
--data_dir [PATH-TO-ASNQ] \
--per_gpu_train_batch_size 150 \
--learning_rate 2e-5 \
--num_train_epochs 2.0 \
--output_dir [PATH-TO-TRANSFER-FOLDER]
python run_glue.py \
--model_type bert \
--model_name_or_path [PATH-TO-TRANSFER-FOLDER] \
--task_name ASNQ \
--do_train \
--do_eval \
--sequential \
--do_lower_case \
--data_dir [PATH-TO-WIKI-QA] \
--per_gpu_train_batch_size 150 \
--learning_rate 1e-6 \
--num_train_epochs 2.0 \
--output_dir [PATH-TO-OUTPUT-FOLDER]
We use the following datasets in the paper:
- ASNQ is a dataset for answer sentence selection derived from Google Natural Questions (NQ) dataset (Kwiatkowski et al. 2019). The dataset details can be found in our paper.
- ASNQ is used to transfer the pre-trained models in the paper, and can be downloaded here.
- Wiki-QA: we used the Wiki-QA dataset from here and removed all the questions that have no correct answers.
- TREC-QA: we used the
*-filtered.jsonl
version of this dataset from here.
- TANDA: BERT-Base ASNQ → Wiki-QA
- TANDA: BERT-Large ASNQ → Wiki-QA
- TANDA: RoBERTa-Large ASNQ → Wiki-QA
- TANDA: BERT-Base ASNQ → TREC-QA
- TANDA: BERT-Large ASNQ → TREC-QA
- TANDA: RoBERTa-Large ASNQ → TREC-QA
The paper is to appear in the AAAI 2020 proceedings. For now, please cite the Arxiv version
@article{garg2019tanda,
title={TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection},
author={Siddhant Garg and Thuy Vu and Alessandro Moschitti},
year={2019},
eprint={1911.04118},
}
The documentation, including the shared data and models, is made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. See the LICENSE file.
The sample script within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.
For help or issues, please submit a GitHub issue.
For direct communication, please contact Siddhant Garg (sgarg33 is at wisc dot edu, https://github.com/sid7954), Thuy Vu (thuyvu is at amazon dot com), or Alessandro Moschitti (amosch is at amazon dot com).