Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run sentence level nqg? Is there any dataset? #26

Open
hi-weiyuan opened this issue Jun 21, 2020 · 5 comments
Open

How to run sentence level nqg? Is there any dataset? #26

hi-weiyuan opened this issue Jun 21, 2020 · 5 comments

Comments

@hi-weiyuan
Copy link

I ran this code in paragraph level and got good performance. But when I ran this code in sentence level with Zhou's dataset https://github.com/magic282/NQG, it performed only 14 of BLUE-4. Is there any problem?

@seanie12
Copy link
Owner

  1. I think 14 bleu4 score is reasonable performance, because neural question generation models are not generalized well to other dataset.
  2. You can find sentence level qg dataset in this repo .

@hi-weiyuan
Copy link
Author

  • I think 14 bleu4 score is reasonable performance, because neural question generation models are not generalized well to other dataset.
  • You can find sentence level qg dataset in this repo .

However, the original paper conducted experiment on this dataset and achieved about 16 BLUE. The dataset is a sentence level dataset with Split2 in paper.

@seanie12
Copy link
Owner

I think you might train the model on sentence level qg dataset to get the similar score in the paper

@theDyingofLight
Copy link

I ran this code in paragraph level and got good performance. But when I ran this code in sentence level with Zhou's dataset https://github.com/magic282/NQG, it performed only 14 of BLUE-4. Is there any problem?

Did you directly use the checkpoint of pretrained model from the auther to get 16.76 of BLUE-4? It was 14.63 of BLUE-4 when I ran the code with provided paragraph-level dataset.

@pzxbjx
Copy link

pzxbjx commented Sep 16, 2020

  • I think 14 bleu4 score is reasonable performance, because neural question generation models are not generalized well to other dataset.
  • You can find sentence level qg dataset in this repo .

However, the original paper conducted experiment on this dataset and achieved about 16 BLUE. The dataset is a sentence level dataset with Split2 in paper.

In my opinion, the original paper conducted experiment on paragraph-level dataset in split2 so it got a result about 16 BLue, as for a sentence-level dataset, i think maybe 14~15 is okay. By the way, did you change the parameters to get a similar resutl around 16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants