Skip to content

lucadiliello/answer-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

answer-selection

New datasets for Answer Sentence Selection.

Why?

Common datasets for Answer Sentence Selection (AS2) like WikiQA and TREC-QA are very small (a few thousand QA pairs) and are not challenging anymore. Some systems achiveve MAP > 92% on both datasets.

A recent large-scale dataset (ASNQ) shows that more data are needed to reach SOTA performance. Inspired by how ASNQ was built starting from Google's NQ, we release 4 large-scale dataset for AS2 derived from NewsQA, TriviaQA, SearchQA and HotpotQA.

We named those new dataset NewsAS2, TriviaAS2, SearchAS2 and HotpotAS2.

NOTICE: in all datasets, the original validation set has been split in both dev and test to have non-hidden labels.

How to

The dataset are available from the Huggingface datasets repository.

First, install the datasets library with pip install datasets --upgrade.

Then, dowload the datasets with:

from datasets import load_dataset

news_as2 = load_dataset('lucadiliello/news_as2')
trivia_as2 = load_dataset('lucadiliello/trivia_as2')
search_as2 = load_dataset('lucadiliello/search_as2')
hotpot_as2 = load_dataset('lucadiliello/hotpot_as2')

Statistics

Dataset Training set Validation set Test set
# Q # QA pairs # Q # QA pairs # Q # QA pairs
NewsAS2 715611840533210251844208351472
TriviaAS2 61688184334939331170123852114853
SearchAS2 117220328190985092363608470236792
HotpotAS2 72921489238298925295291224846

Baselines performance

  • Best checkpoint selection on the MAP of the development set.
  • 5 different runs with different random seeds.
  • Standard deviation of results in round brackets.

NewsAS2

Model MAP MRR P@1
RoBERTa Base 82.4 (0.2) 85.2 (0.3) 76.4 (0.6)
ELECTRA Base 82.0 (0.2) 84.8 (0.2) 76.0 (0.2)

TriviaAS2

Model MAP MRR P@1
RoBERTa Base 76.9 (0.6) 82.2 (0.5) 73.1 (0.5)
ELECTRA Base 73.3 (0.7) 79.1 (1.1) 68.9 (1.3)

SearchAS2

Model MAP MRR P@1
RoBERTa Base 84.1 (0.2) 88.1 (0.3) 82.1 (0.5)
ELECTRA Base 83.0 (0.1) 87.3 (0.2) 80.3 (0.4)

HotpotAS2

Model MAP MRR P@1
RoBERTa Base 92.6 (0.2) 93.5 (0.2) 90.4 (0.3)
ELECTRA Base 92.9 (0.1) 93.5 (0.1) 89.5 (0.1)

About

New datasets for Answer Sentence Selection task

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published