Source code for the CIKM 2023 paper "PaperLM: A Pre-trained Model for Hierarchical Examination Paper Representation Learning"
Requirments
numpy
pandas
torch==1.11.0
transformers==4.26.0
edunlp==0.0.8
tqdm
sklearn
scipy
-
Data preprocessing
- Convert paper text into vector using pre-trained BERT
- Build knowledge table
cd src # train python data_preprocess.py
-
Pre-train
cd src # train python main.py --mode pretrain
-
Test
cd src # paper difficulty estimation python main.py --mode finetune --downstream_task diff # examination paper retrieval python main.py --mode finetune --downstream_task similarity # paper clustering python main.py --mode finetune --downstream_task cluster
For more running arguments, please refer to [src/utils.py].