Code for "NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias", NAACL2022 [PDF]
If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:
@inproceedings{lee2022neus, title={NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias}, author={Lee, Nayeon and Bang, Yejin and Yu, Tiezheng and Madotto, Andrea and Fung, Pascale}, journal={Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)}, year={2022} }
Run pip install -r requirement.txt
Acknowledgement: Our codes are based on Huggingface's Transformer codebase.
Run data_preprocessing.ipynb
.
- To replicate all our data preprocessing steps, run all codes.
- To just prepare testset of your own to run with our pretrained model, run the
create_source_target_with_processing_probe_format(article_test, 'test')
function with your ownarticle_test
list.
Datasets are inside data
folder:
raw_crawled/
: contains crawled data from Allsides.com until 2021-10-19 (with all the meta dataset)naacl2022_filtered_allsides_article.json
: filtered & preprocessed verion fromraw_crawled
.naacl2022_lrc_roundup_random_order_probe
: contains final train/val/test files used in our NeuS-Title model.
Full article version (smaller subset): Not directly used in our paper, but releasing to help the community 😊
- BASIL (cite) extended (neutral): Refer to https://github.com/nayeon7lee/framing_bias_datasets
- AllSides articles: Refer to https://github.com/nayeon7lee/framing_bias_datasets
Run bash finetune_neus_script.sh
with desired hyper-parameters & correct configurations.
- Download our checkpoint from here into
save/
directory - Run
bash generate_script.sh
with correct configurations (i.e.PROJECT_DIR
,OUTPUT_DIR
,DATA_DIR
).
If you want to generate using your own model checkpoint, change the OUTPUT_DIR
to be the directory containing your model checkpoint.
All the lexicons are inside data/lexicons
:
NRC-VAD-Lexicon-Aug2018Release/
: VAD lexicon used in our paper. Refer to the README inside the folder for citation details.bias-lexicon/
: other lexicons that the community could find useful (not used in our work). Refer to README inside the folder for citation details.bias_related_lexicons/
: other lexicons that the community could find useful (not used in our work). Refer to README inside the folder for citation details.
Run the codes in metric_code.ipynb
.
- Make sure you have
data/lexicons/NRC-VAD-Lexicon-Aug2018Release/
. - Also, make sure you correctly load test generation texts and target texts into
gens
andtest_tgts
variables.