MetaKP

🖋 Di Wu*, Xiaoxian Shen*, and Kai-Wei Chang

We introduce on-demand keyphrase generation, a novel paradigm that requires keyphrase predictions to conform to specific high-level goals or intents.

We release MetaKP, a large-scale benchmarking dataset covering four datasets, 7500 documents, and 3760 goals from the news and biomedical text domain.

Setup

Data

The MetaKP dataset is officially released in the data/ folder. Please download and uncompress the data to the data/ folder. You can use the get_data.sh script to uncompress all the files in the corresponding folder.

cd data
bash get_data.sh
cd ..

We cover datas in KPTimes, DUC2001, KPBiomed, and Pubmed. Each file has the following format:

humanvalid_processed_release.json: This is a human valided data created using our pipeline.
rejection_augmented_release.json: This is the rejection augmented version, for a document, all the negative goals are appended after positive goals.

Each file contains the following fields:

id: the unique id for each goal, keyphrase pair.
title: title of the document.
document: document body.
goal: the intended goal.
keyphrases: a list of keyphrases that could be generated using the goal.

Environment

We recommend using a conda environment for the project. You may follow the steps below to set up.

conda create --name metakp python==3.8
conda activate metakp
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Fine-tuning

Pre-processing

Prepare the data by running cd data ; python process_seq2seq.python . The script will preprocess specified datasets.

Train

cd fine-tuning
Modify the parameters in run_train_bart_multitask.sh or run_train_flan_t5_with_rejection.sh.
bash run_train_bart_multitask.sh or bash run_train_flan_t5_with_rejection.sh

Inference and Evaluation

bash run_test.sh

Prompting

Inference and Evaluation

Run zero-shot goal rejection experiment

bash run_query_reject_zero_shot.sh

Run zero-shot on-demand keyphrase generation experiment

bash run_query_kp_zero_shot.sh

Run self-consistency prompting on-demand keyphrase generation experiment

bash run_query_kp_zero_shot_sample.sh

By default, the threshold for scoring keyphrases in self-consistency prompting evaluation is 0.3.

If you find this work helpful, please consider citing

@article{wu2024metakp,
      title={MetaKP: On-Demand Keyphrase Generation}, 
      author={Di Wu and Xiaoxian Shen and Kai-Wei Chang},
      year={2024},
      eprint={2407.00191},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.00191}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
data		data
data_construction		data_construction
eval		eval
fine-tuning		fine-tuning
prompting		prompting
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetaKP

Setup

Data

Environment

Fine-tuning

Pre-processing

Train

Inference and Evaluation

Prompting

Inference and Evaluation

About

Releases

Packages

Contributors 2

Languages

License

uclanlp/MetaKP

Folders and files

Latest commit

History

Repository files navigation

MetaKP

Setup

Data

Environment

Fine-tuning

Pre-processing

Train

Inference and Evaluation

Prompting

Inference and Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages