源码结构来自于论文: Paragraph-level Neural Question Generation with Maxout Pointer and Gated Self-attention Networks
源代码来自于github: https://github.com/seanie12/neural-question-generation
数据来源:https://tianchi.aliyun.com/competition/entrance/531826/information
预训练模型MTBERT:https://code.ihub.org.cn/projects/1775
This code is written in Python. Dependencies include
-
python >= 3.6
-
pytorch >= 1.4
-
nltk
-
tqdm
使用MTBERT的embedding层作为LSTM之前的embedding层,使用专业医药领域预训练数据替代了原来的glove
由于答案在原文里,所以增加了对答案的识别,延续了answer_tag的结构
由于数据集质量不好,手动做了一些数据处理
python -W ignore main.py [--train] [--model_path]
初赛结果:0.5415