Skip to content

Latest commit

 

History

History
241 lines (214 loc) · 48.4 KB

MODEL_CARD.md

File metadata and controls

241 lines (214 loc) · 48.4 KB

Model List

Here, we list our main models of

  • End2end QAG (Question and Answer Generation): Input a paragraph and generate a list of question and answer.
  • Multitask QAG: Model that can perform both of AE and QG (switch mode by adding task prefix of answer extraction: or question generation:).
  • QG (Question Generation): Input a paragraph with an answer highlighted by <hl>, and generate a question.
  • AE (Answer Extraction): Input a paragraph with a sentence highlighted by <hl>, and generate an answer candidate.

Usage

Open In Colab

  • For QAG, you can use Multitask QAG and End2end QAG as below.
from pprint import pprint
from lmqg import TransformersQG

# initialize model
model = TransformersQG(model='lmqg/t5-base-squad-qag') # or TransformersQG(model='lmqg/t5-base-squad-qg-ae') 
# paragraph to generate pairs of question and answer
context = "William Turner was an English painter who specialised in watercolour landscapes. He is often known as William Turner of Oxford or just Turner of Oxford to distinguish him from his contemporary, J. M. W. Turner. Many of Turner's paintings depicted the countryside around Oxford. One of his best known pictures is a view of the city of Oxford from Hinksey Hill."
# model prediction
question_answer = model.generate_qa(context)
# the output is a list of tuple (question, answer)
pprint(question_answer)
  • For Pipeline QAG, you have to specify the AE and QG model each as below.
from pprint import pprint
from lmqg import TransformersQG

# initialize model
model = TransformersQG(model='lmqg/t5-base-squad-qg', model_ae='lmqg/t5-base-squad-ae') 
# paragraph to generate pairs of question and answer
context = "William Turner was an English painter who specialised in watercolour landscapes. He is often known as William Turner of Oxford or just Turner of Oxford to distinguish him from his contemporary, J. M. W. Turner. Many of Turner's paintings depicted the countryside around Oxford. One of his best known pictures is a view of the city of Oxford from Hinksey Hill."
# model prediction
question_answer = model.generate_qa(context)
# the output is a list of tuple (question, answer)
pprint(question_answer)
  • For QG only, you need to specify answer, and use any QG models as below (the multitask QAG model can perform QG as well).
from pprint import pprint
from lmqg import TransformersQG

# initialize model
model = TransformersQG(language='en', model='lmqg/t5-base-squad-qg')

# a list of paragraph
context = [
    "William Turner was an English painter who specialised in watercolour landscapes",
    "William Turner was an English painter who specialised in watercolour landscapes"
]
# a list of answer (same size as the context)
answer = [
    "William Turner",
    "English"
]
# model prediction
question = model.generate_q(list_context=context, list_answer=answer)
pprint(question)
  • For AE only, you can extract answer as below.
from pprint import pprint
from lmqg import TransformersQG

# initialize model
model = TransformersQG(language='en', model='lmqg/t5-base-squad-ae')
# model prediction
answer = model.generate_a("William Turner was an English painter who specialised in watercolour landscapes")
pprint(answer)

Main Models

We list our main models of AE, QG, End2end QAG, and Multitask QAG in 8 languages (en/fr/ja/ko/ru/it/es/de).

QAG:

Model Data Type Language Model Language QAAlignedF1Score (BERTScore) QAAlignedF1Score (MoverScore) QAAlignedPrecision (BERTScore) QAAlignedPrecision (MoverScore) QAAlignedRecall (BERTScore) QAAlignedRecall (MoverScore)
lmqg/t5-small-squad-qag lmqg/qg_squad End2end QAG t5-small English 92.76 64.59 92.87 65.30 92.68 63.99
lmqg/t5-small-squad-qg-ae lmqg/qg_squad Multitask QAG t5-small English 91.74 63.23 91.49 63.26 92.01 63.29
lmqg/t5-small-squad-qg, lmqg/t5-small-squad-ae lmqg/qg_squad Pipeline QAG t5-small English 92.26 63.83 92.07 63.92 92.48 63.82
lmqg/t5-base-squad-qag lmqg/qg_squad End2end QAG t5-base English 93.34 65.78 93.18 65.96 93.51 65.68
lmqg/t5-base-squad-qg-ae lmqg/qg_squad Multitask QAG t5-base English 92.53 64.23 92.35 64.33 92.74 64.23
lmqg/t5-base-squad-qg, lmqg/t5-base-squad-ae lmqg/qg_squad Pipeline QAG t5-base English 92.75 64.36 92.59 64.45 92.93 64.35
lmqg/t5-large-squad-qag lmqg/qg_squad End2end QAG t5-large English 93.45 66.05 93.34 66.34 93.57 65.84
lmqg/t5-large-squad-qg-ae lmqg/qg_squad Multitask QAG t5-large English 92.87 64.67 92.72 64.82 93.04 64.63
lmqg/t5-large-squad-qg, lmqg/t5-large-squad-ae lmqg/qg_squad Pipeline QAG t5-large English 92.97 64.72 92.83 64.87 93.14 64.66
lmqg/mt5-small-frquad-qag lmqg/qg_frquad End2end QAG google/mt5-small French 77.23 52.36 76.76 52.19 77.74 52.54
lmqg/mt5-small-frquad-qg-ae lmqg/qg_frquad Multitask QAG google/mt5-small French 79.7 54.22 77.29 52.84 82.36 55.76
lmqg/mt5-small-frquad-qg, lmqg/mt5-small-frquad-ae lmqg/qg_frquad Pipeline QAG google/mt5-small French 79.72 53.94 77.58 52.70 82.06 55.32
lmqg/mt5-base-frquad-qag lmqg/qg_frquad End2end QAG google/mt5-base French 78.28 51.66 78.36 51.73 78.21 51.59
lmqg/mt5-base-frquad-qg-ae lmqg/qg_frquad Multitask QAG google/mt5-base French 79.16 53.90 76.69 52.57 81.87 55.36
lmqg/mt5-base-frquad-qg, lmqg/mt5-base-frquad-ae lmqg/qg_frquad Pipeline QAG google/mt5-base French 68.59 47.87 67.59 47.42 69.69 48.36
lmqg/mt5-small-dequad-qag lmqg/qg_dequad End2end QAG google/mt5-small German 0 0 0 0 0 0
lmqg/mt5-small-dequad-qg-ae lmqg/qg_dequad Multitask QAG google/mt5-small German 80.02 53.99 78.91 53.77 81.23 54.27
lmqg/mt5-small-dequad-qg, lmqg/mt5-small-dequad-ae lmqg/qg_dequad Pipeline QAG google/mt5-small German 81.19 54.3 80.00 54.04 82.46 54.59
lmqg/mt5-base-dequad-qag lmqg/qg_dequad End2end QAG google/mt5-base German 0.1 0.10 0.10 0.10 0.10 0.10
lmqg/mt5-base-dequad-qg-ae lmqg/qg_dequad Multitask QAG google/mt5-base German 6.11 4.24 6.30 4.34 5.95 4.15
lmqg/mt5-base-dequad-qg, lmqg/mt5-base-dequad-ae lmqg/qg_dequad Pipeline QAG google/mt5-base German 76.86 52.96 76.28 52.93 77.55 53.06
lmqg/mt5-small-itquad-qag lmqg/qg_itquad End2end QAG google/mt5-small Italian 79.41 54.15 81.16 55.49 77.79 52.94
lmqg/mt5-small-itquad-qg-ae lmqg/qg_itquad Multitask QAG google/mt5-small Italian 81.81 56.02 81.17 55.76 82.51 56.32
lmqg/mt5-small-itquad-qg, lmqg/mt5-small-itquad-ae lmqg/qg_itquad Pipeline QAG google/mt5-small Italian 81.63 55.85 81.04 55.60 82.28 56.14
lmqg/mt5-base-itquad-qag lmqg/qg_itquad End2end QAG google/mt5-base Italian 79.93 53.80 81.06 54.64 78.87 53.02
lmqg/mt5-base-itquad-qg-ae lmqg/qg_itquad Multitask QAG google/mt5-base Italian 81.98 56.35 81.19 56.00 82.83 56.75
lmqg/mt5-base-itquad-qg, lmqg/mt5-base-itquad-ae lmqg/qg_itquad Pipeline QAG google/mt5-base Italian 81.68 55.83 81.25 55.68 82.16 56.01
lmqg/mt5-small-jaquad-qag lmqg/qg_jaquad End2end QAG google/mt5-small Japanese 58.35 39.19 58.34 39.21 58.38 39.17
lmqg/mt5-small-jaquad-qg-ae lmqg/qg_jaquad Multitask QAG google/mt5-small Japanese 80.51 56.28 80.51 56.28 80.51 56.28
lmqg/mt5-small-jaquad-qg, lmqg/mt5-small-jaquad-ae lmqg/qg_jaquad Pipeline QAG google/mt5-small Japanese 79.78 55.85 76.84 53.80 83.06 58.22
lmqg/mt5-base-jaquad-qag lmqg/qg_jaquad End2end QAG google/mt5-base Japanese 74.52 52.08 74.36 52.01 74.71 52.16
lmqg/mt5-base-jaquad-qg-ae lmqg/qg_jaquad Multitask QAG google/mt5-base Japanese 80.35 56.23 77.28 54.02 83.79 58.81
lmqg/mt5-base-jaquad-qg, lmqg/mt5-base-jaquad-ae lmqg/qg_jaquad Pipeline QAG google/mt5-base Japanese 80.31 56.36 77.14 54.00 83.89 59.12
lmqg/mt5-small-koquad-qag lmqg/qg_koquad End2end QAG google/mt5-small Korean 74.23 75.06 74.29 75.14 74.20 75.04
lmqg/mt5-small-koquad-qg-ae lmqg/qg_koquad Multitask QAG google/mt5-small Korean 80.36 82.55 77.34 78.93 83.72 86.69
lmqg/mt5-small-koquad-qg, lmqg/mt5-small-koquad-ae lmqg/qg_koquad Pipeline QAG google/mt5-small Korean 80.52 82.95 77.56 79.39 83.80 87.02
lmqg/mt5-base-koquad-qag lmqg/qg_koquad End2end QAG google/mt5-base Korean 76.88 77.95 77.10 78.29 76.69 77.66
lmqg/mt5-base-koquad-qg-ae lmqg/qg_koquad Multitask QAG google/mt5-base Korean 80.28 81.97 77.03 78.10 83.91 86.43
lmqg/mt5-base-koquad-qg, lmqg/mt5-base-koquad-ae lmqg/qg_koquad Pipeline QAG google/mt5-base Korean 77.26 77.51 76.37 76.26 78.25 78.95
lmqg/mt5-small-ruquad-qag lmqg/qg_ruquad End2end QAG google/mt5-small Russian 52.95 38.59 52.86 38.57 53.06 38.62
lmqg/mt5-small-ruquad-qg-ae lmqg/qg_ruquad Multitask QAG google/mt5-small Russian 79.74 56.69 76.15 54.11 83.83 59.79
lmqg/mt5-small-ruquad-qg, lmqg/mt5-small-ruquad-ae lmqg/qg_ruquad Pipeline QAG google/mt5-small Russian 76.96 55.53 73.41 53.24 81.05 58.25
lmqg/mt5-base-ruquad-qag lmqg/qg_ruquad End2end QAG google/mt5-base Russian 74.63 54.24 73.97 53.91 75.38 54.65
lmqg/mt5-base-ruquad-qg-ae lmqg/qg_ruquad Multitask QAG google/mt5-base Russian 80.21 57.17 76.48 54.40 84.49 60.55
lmqg/mt5-base-ruquad-qg, lmqg/mt5-base-ruquad-ae lmqg/qg_ruquad Pipeline QAG google/mt5-base Russian 77.03 55.61 73.44 53.27 81.17 58.39
lmqg/mt5-small-esquad-qag lmqg/qg_esquad End2end QAG google/mt5-small Spanish 78.12 53.92 78.00 53.93 78.27 53.93
lmqg/mt5-small-esquad-qg-ae lmqg/qg_esquad Multitask QAG google/mt5-small Spanish 79.06 54.49 76.46 52.96 81.94 56.21
lmqg/mt5-small-esquad-qg, lmqg/mt5-small-esquad-ae lmqg/qg_esquad Pipeline QAG google/mt5-small Spanish 79.89 54.82 77.46 53.31 82.56 56.52
lmqg/mt5-base-esquad-qag lmqg/qg_esquad End2end QAG google/mt5-base Spanish 78.96 54.30 78.66 54.21 79.31 54.42
lmqg/mt5-base-esquad-qg-ae lmqg/qg_esquad Multitask QAG google/mt5-base Spanish 79.67 54.82 77.14 53.27 82.44 56.56
lmqg/mt5-base-esquad-qg, lmqg/mt5-base-esquad-ae lmqg/qg_esquad Pipeline QAG google/mt5-base Spanish 80.79 55.25 78.45 53.70 83.34 56.99
lmqg/mt5-small-zhquad-qag lmqg/qg_zhquad End2end QAG google/mt5-small Chinese 75.47 52.42 75.56 52.53 75.40 52.32
lmqg/mt5-small-zhquad-qg-ae lmqg/qg_zhquad Multitask QAG google/mt5-small Chinese 78.55 53.47 75.41 51.50 82.09 55.73
lmqg/mt5-small-zhquad-qg, lmqg/mt5-small-zhquad-ae lmqg/qg_zhquad Pipeline QAG google/mt5-small Chinese 46.78 33.59 46.38 33.44 47.23 33.76
lmqg/mt5-base-zhquad-qag lmqg/qg_zhquad End2end QAG google/mt5-base Chinese 73.57 49.76 73.07 49.62 74.12 49.92
lmqg/mt5-base-zhquad-qg-ae lmqg/qg_zhquad Multitask QAG google/mt5-base Chinese 77.75 52.96 76.45 52.25 79.20 53.77
lmqg/mt5-base-zhquad-qg, lmqg/mt5-base-zhquad-ae lmqg/qg_zhquad Pipeline QAG google/mt5-base Chinese 37.91 27.34 38.58 27.74 37.32 26.98

QG: Question Generation

Model Data Language Model Language BERTScore METEOR MoverScore ROUGE-L
lmqg/t5-small-squad-qg lmqg/qg_squad t5-small English 90.20 25.84 63.89 51.43
lmqg/t5-base-squad-qg lmqg/qg_squad t5-base English 90.60 26.97 64.74 53.33
lmqg/t5-large-squad-qg lmqg/qg_squad t5-large English 91.00 27.70 65.29 54.13
lmqg/mt5-small-frquad-qg lmqg/qg_frquad google/mt5-small French 80.71 17.51 56.50 28.56
lmqg/mt5-base-frquad-qg lmqg/qg_frquad google/mt5-base French 77.81 15.55 54.58 25.88
lmqg/mt5-small-dequad-qg lmqg/qg_dequad google/mt5-small German 79.90 11.47 54.64 10.08
lmqg/mt5-base-dequad-qg lmqg/qg_dequad google/mt5-base German 80.39 13.65 55.73 11.10
lmqg/mt5-small-itquad-qg lmqg/qg_itquad google/mt5-small Italian 80.80 17.57 56.79 21.93
lmqg/mt5-base-itquad-qg lmqg/qg_itquad google/mt5-base Italian 81.16 18.00 57.11 22.51
lmqg/mt5-small-jaquad-qg lmqg/qg_jaquad google/mt5-small Japanese 80.87 29.03 58.67 50.88
lmqg/mt5-base-jaquad-qg lmqg/qg_jaquad google/mt5-base Japanese 81.77 30.58 59.68 52.67
lmqg/mt5-small-koquad-qg lmqg/qg_koquad google/mt5-small Korean 82.89 27.52 82.49 25.64
lmqg/mt5-base-koquad-qg lmqg/qg_koquad google/mt5-base Korean 84.52 29.62 83.36 28.57
lmqg/mt5-small-ruquad-qg lmqg/qg_ruquad google/mt5-small Russian 84.27 26.39 62.49 31.39
lmqg/mt5-base-ruquad-qg lmqg/qg_ruquad google/mt5-base Russian 85.82 28.48 64.56 33.02
lmqg/mt5-small-esquad-qg lmqg/qg_esquad google/mt5-small Spanish 84.07 22.71 59.06 24.62
lmqg/mt5-base-esquad-qg lmqg/qg_esquad google/mt5-base Spanish 84.47 23.43 59.62 25.45
lmqg/mt5-small-zhquad-qg lmqg/qg_zhquad google/mt5-small Chinese 84.07 22.75 56.87 32.71
lmqg/mt5-base-zhquad-qg lmqg/qg_zhquad google/mt5-base Chinese 77.38 23.92 57.50 34.72

AE: Answer Extraction

Model Data Language Model Language AnswerExactMatch AnswerF1Score
lmqg/t5-small-squad-ae lmqg/qg_squad t5-small English 56.15 68.06
lmqg/t5-base-squad-ae lmqg/qg_squad t5-base English 59.48 70.32
lmqg/t5-large-squad-ae lmqg/qg_squad t5-large English 59.77 70.41
lmqg/mt5-small-frquad-ae lmqg/qg_frquad google/mt5-small French 39.05 59.77
lmqg/mt5-base-frquad-ae lmqg/qg_frquad google/mt5-base French 3.92 19.32
lmqg/mt5-small-dequad-ae lmqg/qg_dequad google/mt5-small German 8.80 36.07
lmqg/mt5-base-dequad-ae lmqg/qg_dequad google/mt5-base German 5.54 30.15
lmqg/mt5-small-itquad-ae lmqg/qg_itquad google/mt5-small Italian 55.07 70.41
lmqg/mt5-base-itquad-ae lmqg/qg_itquad google/mt5-base Italian 52.15 68.09
lmqg/mt5-small-jaquad-ae lmqg/qg_jaquad google/mt5-small Japanese 23.99 24.01
lmqg/mt5-base-jaquad-ae lmqg/qg_jaquad google/mt5-base Japanese 28.33 28.33
lmqg/mt5-small-koquad-ae lmqg/qg_koquad google/mt5-small Korean 79.41 86.39
lmqg/mt5-base-koquad-ae lmqg/qg_koquad google/mt5-base Korean 69.49 77.32
lmqg/mt5-small-ruquad-ae lmqg/qg_ruquad google/mt5-small Russian 33.00 56.62
lmqg/mt5-base-ruquad-ae lmqg/qg_ruquad google/mt5-base Russian 28.59 54.69
lmqg/mt5-small-esquad-ae lmqg/qg_esquad google/mt5-small Spanish 56.14 73.93
lmqg/mt5-base-esquad-ae lmqg/qg_esquad google/mt5-base Spanish 57.81 74.84
lmqg/mt5-small-zhquad-ae lmqg/qg_zhquad google/mt5-small Chinese 95.01 95.10
lmqg/mt5-base-zhquad-ae lmqg/qg_zhquad google/mt5-base Chinese 92.62 92.68

Citation

Please cite following paper if you use any resource and see the code to reproduce the model if needed.

@inproceedings{ushio-etal-2022-generative,
    title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, U.A.E.",
    publisher = "Association for Computational Linguistics",
}
@inproceedings{ushio-etal-2023-an-empirical,
    title = "An Empirical Comparison of LM-based Question and Answer Generation Methods",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics",
    month = Jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
}
@inproceedings{ushio-etal-2023-a-practical-toolkit
    title = "A Practical Toolkit for Multilingual Question and Answer Generation, ACL 2022, System Demonstration",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = Jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
}