New datasets for Answer Sentence Selection.
Common datasets for Answer Sentence Selection (AS2) like WikiQA and TREC-QA are very small (a few thousand QA pairs) and are not challenging anymore. Some systems achiveve MAP > 92% on both datasets.
A recent large-scale dataset (ASNQ) shows that more data are needed to reach SOTA performance. Inspired by how ASNQ was built starting from Google's NQ, we release 4 large-scale dataset for AS2 derived from NewsQA, TriviaQA, SearchQA and HotpotQA.
We named those new dataset NewsAS2
, TriviaAS2
, SearchAS2
and HotpotAS2
.
NOTICE: in all datasets, the original validation set has been split in both dev and test to have non-hidden labels.
The dataset are available from the Huggingface datasets repository.
First, install the datasets
library with pip install datasets --upgrade
.
Then, dowload the datasets with:
from datasets import load_dataset
news_as2 = load_dataset('lucadiliello/news_as2')
trivia_as2 = load_dataset('lucadiliello/trivia_as2')
search_as2 = load_dataset('lucadiliello/search_as2')
hotpot_as2 = load_dataset('lucadiliello/hotpot_as2')
Dataset | Training set | Validation set | Test set | |||
---|---|---|---|---|---|---|
# Q | # QA pairs | # Q | # QA pairs | # Q | # QA pairs | |
NewsAS2 | 71561 | 1840533 | 2102 | 51844 | 2083 | 51472 |
TriviaAS2 | 61688 | 1843349 | 3933 | 117012 | 3852 | 114853 |
SearchAS2 | 117220 | 3281909 | 8509 | 236360 | 8470 | 236792 |
HotpotAS2 | 72921 | 489238 | 2989 | 25295 | 2912 | 24846 |
- Best checkpoint selection on the MAP of the development set.
- 5 different runs with different random seeds.
- Standard deviation of results in round brackets.
Model | MAP | MRR | P@1 |
---|---|---|---|
RoBERTa Base | 82.4 (0.2) | 85.2 (0.3) | 76.4 (0.6) |
ELECTRA Base | 82.0 (0.2) | 84.8 (0.2) | 76.0 (0.2) |
Model | MAP | MRR | P@1 |
---|---|---|---|
RoBERTa Base | 76.9 (0.6) | 82.2 (0.5) | 73.1 (0.5) |
ELECTRA Base | 73.3 (0.7) | 79.1 (1.1) | 68.9 (1.3) |
Model | MAP | MRR | P@1 |
---|---|---|---|
RoBERTa Base | 84.1 (0.2) | 88.1 (0.3) | 82.1 (0.5) |
ELECTRA Base | 83.0 (0.1) | 87.3 (0.2) | 80.3 (0.4) |
Model | MAP | MRR | P@1 |
---|---|---|---|
RoBERTa Base | 92.6 (0.2) | 93.5 (0.2) | 90.4 (0.3) |
ELECTRA Base | 92.9 (0.1) | 93.5 (0.1) | 89.5 (0.1) |