Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating OpenI results #1

Open
jantrienes opened this issue May 20, 2022 · 2 comments
Open

Replicating OpenI results #1

jantrienes opened this issue May 20, 2022 · 2 comments

Comments

@jantrienes
Copy link

Hi @jinpeng01, I'd like to replicate the WGSum results on my hardware and have a question about the ROUGE score I get.

What I did so far:

  1. Install dependencies (see conda environment below). Because of a newer GPU, I had to upgrade PyTorch to version 1.7.1.
  2. Download model_trans/openi.pt for here: https://drive.google.com/drive/folders/1okrhVfsfTqZ4mnsmABiZG0sWBqrHPOg2?usp=sharing
  3. Adapt paths in src/WGSum_test.sh and run inference on the data in bert_openI/radiology/ (see full log below)

I get following ROUGE scores:

>> ROUGE-F(1/2/l): 59.61/49.60/59.28

Questions:

  1. What run should I compare this to in the paper? Is it Table 3, result OpenI WGSUM (TRANS+GAT) with R1/R2/RL of 61.63/50.98/61.73?
  2. Comparing to that run, do you know why the ROUGE scores could be lower?

Thank you!
Jan


Conda environment:

name: wgsum
channels:
  - pytorch
  - defaults
dependencies:
  - python=3.8.13
  - pip
  - pytorch=1.7.1
  - cudatoolkit=11.0
  - pip:
    - multiprocess==0.70.9
    - pyrouge==0.1.3
    - pytorch-transformers==1.2.0
    - tensorboardX==2.4
    - tensorboard==2.6.0
    - git+https://github.com/tagucci/pythonrouge.git
    - torch-scatter==2.0.7 -f https://data.pyg.org/whl/torch-1.7.1+cu110.html
    - torch-sparse==0.6.9 -f https://data.pyg.org/whl/torch-1.7.1+cu110.html
    - "torch-geometric<2"

Full execution log

/local/work/WGSum/src$ sh WGSUM_test.sh
../models/openi
[2022-05-20 07:48:29,438 INFO] Loading checkpoint from ../models/openi/openi.pt
Namespace(accum_count=1, alpha=0.95, batch_size=3000, beam_size=5, bert_data_path='../bert_openI/radiology/radiology', beta1=0.9, beta2=0.999, block_trigram=True, dec_dropout=0.2, dec_ff_size=2048, dec_heads=8, dec_hidden_size=512, dec_layers=6, enc_dropout=0.2, enc_ff_size=2048, enc_hidden_size=512, enc_layers=6, encoder='baseline', ext_dropout=0.2, ext_ff_size=2048, ext_heads=8, ext_hidden_size=768, ext_layers=2, finetune_bert=True, generator_shard_size=32, gpu_ranks=[0], label_smoothing=0.1, large=False, load_from_extractive='', log_file='../models/openi/testlog', lr=1, lr_bert=0.002, lr_dec=0.002, max_grad_norm=0, max_length=50, max_pos=512, max_tgt_len=140, min_length=7, mode='test', model_path='../models/openi', optim='adam', param_init=0, param_init_glorot=True, recall_eval=False, report_every=1, report_rouge=True, result_path='../models/openi/result', save_checkpoint_steps=5, seed=666, sep_optim=True, share_emb=False, task='abs', temp_dir='../temp', test_all=False, test_batch_size=500, test_from='../models/openi/openi.pt', test_start_from=-1, train_from='', train_steps=1000, use_bert_emb=False, use_interval=True, visible_gpus='0', warmup_steps=8000, warmup_steps_bert=8000, warmup_steps_dec=8000, world_size=1)
[2022-05-20 07:48:30,268 INFO] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at ../temp/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517
[2022-05-20 07:48:30,269 INFO] Model config {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 2,
  "output_attentions": false,
  "output_hidden_states": false,
  "pad_token_id": 0,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

[2022-05-20 07:48:30,740 INFO] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at ../temp/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
[2022-05-20 07:48:36,152 INFO] Loading test dataset from ../bert_openI/radiology/radiology.test.0.bert.pt, number of examples: 576
[2022-05-20 07:48:36,612 INFO] loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at ../temp/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
/local/work/WGSum/src/models/predictor.py:359: UserWarning: This overload of nonzero is deprecated:
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/torch/csrc/utils/python_arg_parser.cpp:882.)
  finished_hyp = is_finished[i].nonzero().view(-1)
[2022-05-20 07:50:03,498 INFO] Calculating Rouge
Calculating ROUGE...
ROUGE done.
test set results:

Metric	Score	95% CI
ROUGE-1	59.61	(-2.74,2.93)
ROUGE-2	49.60	(-3.21,3.29)
ROUGE-L	59.28	(-2.79,2.92)
[2022-05-20 07:50:05,417 INFO] Rouges at step 0
>> ROUGE-F(1/2/l): 59.61/49.60/59.28
@jantrienes
Copy link
Author

I noticed that I had accidentally changed min_length from 6 to 7. Reverting that change brings following results which are very close to the ones reported in the paper.

>> ROUGE-F(1/2/l): 62.23/50.13/61.87

I noticed that the ROUGE scores change when running inference again (still staying relatively high). Do you know how to keep the results stable?

@njan-creative
Copy link

Can you help me with the process of preprocessing the actual openI data.

@jantrienes @jinpeng01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants