Skip to content

Latest commit

 

History

History
40 lines (26 loc) · 1.64 KB

README.md

File metadata and controls

40 lines (26 loc) · 1.64 KB

SummPip

This code is for Sigir 2020 paper SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression

Python version: this code is in Python3.6

Dataset

source data which has minimal text pre-processing

target data (for evaluation)

Test SummPip

Step1: place downloaded dataset in the folder ./dataset/multi_news/.

Step2: download the pre-trained word2vec model and place it in the folder ./word_vec/multi_news.

  • If you want to run SummPip on your own dataset, you need to pre-train a W2V model yourself first with gensim.

Step3: Unsupervised Extractive Summarisation

python run_main.py
  • You may want to change -nb_clusters and -nb_words to control the length of the output summary when applying SummPip on your own dataset.

Citation

Please cite if you use our code in production or publications

@inproceedings{zhao2020summpip,
  title={SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression},
  author={Zhao, Jinming and Liu, Ming and Gao, Longxiang and Jin, Yuan and Du, Lan and Zhao, He and Zhang, He and Haffari, Gholamreza},
  booktitle={Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={1949--1952},
  year={2020}
}