Question when reproducing experiment #10

gcalabria · 2023-05-10T08:07:28Z

First of all, I would like to thank you for making the code for work available and I also say that I really liked your paper. It is very interesting.

I am currently writing my master's thesis and I would like to use part of your code to build my own mathematical IR system. So, the first thing I've done is to try to run your scripts and see if my results match yours.

So what I did was to run the get_data.sh script and then the scripts of steps 2 and 3 of the README file. The model is still being trained but I am skeptical about the results I am getting until now.

Can you please confirm if the graphs below are in accordance with your results?

Thank you for your attention :)

The text was updated successfully, but these errors were encountered:

ArvinZhuang · 2023-05-10T09:11:39Z

Hi @g-lopes, thank you for your kind words! Im glad I can help :)

It looks like your plots converging much slower than mine.. May I ask which setting you are exactly running? like msmarco or xorqa dataset? Are you running DSI-QG or the original DSI? how many gpus and batch sizes are used?

gcalabria · 2023-05-10T11:07:19Z

That was quick :)

First, I generated the queries using the following command:

python3 -m torch.distributed.launch --nproc_per_node=8 run.py \
        --task generation \
        --model_name castorini/doc2query-t5-large-msmarco \
        --per_device_eval_batch_size 32 \
        --run_name docTquery-XORQA-generation \
        --max_length 256 \
        --valid_file data/xorqa_data/100k/xorqa_corpus.tsv \
        --output_dir temp \
        --dataloader_num_workers 10 \
        --report_to wandb \
        --logging_steps 100 \
        --num_return_sequences 10

After that, I started training the model running:

python3 -m torch.distributed.launch --nproc_per_node=8 run.py \
        --task "DSI" \
        --model_name "google/mt5-base" \
        --run_name "XORQA-100k-mt5-base-DSI-QG" \
        --max_length 32 \
        --train_file data/xorqa_data/100k/xorqa_corpus.tsv.q10.docTquery \
        --valid_file data/xorqa_data/100k/xorqa_DSI_dev_data.json \
        --output_dir "models/XORQA-100k-mt5-base-DSI-QG" \
        --learning_rate 0.0005 \
        --warmup_steps 100000 \
        --per_device_train_batch_size 32 \
        --per_device_eval_batch_size 32 \
        --evaluation_strategy steps \
        --eval_steps 1000 \
        --max_steps 500000 \
        --save_strategy steps \
        --dataloader_num_workers 10 \
        --save_steps 1000 \
        --save_total_limit 2 \
        --load_best_model_at_end \
        --gradient_accumulation_steps 1 \
        --report_to wandb \
        --logging_steps 100 \
        --dataloader_drop_last False \
        --metric_for_best_model Hits@10 \
        --greater_is_better True \
        --remove_prompt True

I am running DSI-QG, with 8 Nvidia A100 GPUs and batch_size equals to 32.

ArvinZhuang · 2023-05-10T11:21:04Z

Hi @g-lopes , according to your scripts, I think in step 2 you made the mistake of using castorini/doc2query-t5-large-msmarco , which is a English only QG model, to generate cross-lingual queries for XORQA dataset.
You need to do the step 1 to train a cross-lingual QG model for step 2. Otherwise, you may want to test with ms marco dataset.

Or, you can try our trained cross-lingual QG models which I just uploaded on huggingface recently: https://huggingface.co/ielabgroup/xor-tydi-docTquery-mt5-large
However, I used a slightly different prompt for training this model (check out the model card in the above page), if you want to directly use this model in step 2, you need to change the prompt in this line to match the prompt in the model card.

gcalabria · 2023-05-10T11:40:51Z

😃 thank you very much for your help.
I will try to follow your instructions :) cheers!

gcalabria · 2023-05-23T09:38:22Z

@ArvinZhuang I've double-checked the scripts that I was using and I've achieved much better results.

As you can see, I've achieved a hits@10 score of 0.8, and the training is not finished yet 😄
The results above were obtained using the MSMARCO dataset with 100k data points, 0 queries per document were generated using the castorini/doc2query-t5-large-msmarco model.

I am trying to convert the hits@10 scores that I have to absolute values so that I can compare my results with the ones in your paper. Can you please explain to me how the hit scores of Table 1 were computed?

I am multiplying my score by the number of documents in the evaluation dataset. In my case, the evaluation dataset has 6980 and the score is 0.8, so this would result in a score of 5504, which makes no sense to me.

Thanks for your help one more time :)

ArvinZhuang · 2023-05-23T10:05:13Z

@g-lopes Hi, the code for computing hits scores is in this function

DSI-QG/run.py

Line 43 in 479d8d7

def compute_metrics(eval_preds):

if you want to do inference with saved model checkpoints, you can try something like this #1 (comment)

gcalabria · 2023-05-23T10:49:31Z

@g-lopes Hi, the code for computing hits scores is in this function

DSI-QG/run.py

Line 43 in 479d8d7

def compute_metrics(eval_preds):

if you want to do inference with saved model checkpoints, you can try something like this #1 (comment)

I saw this function. My idea of multiplying by the number of documents in the evaluation dataset came from line 61:

DSI-QG/run.py

Line 61 in 479d8d7

    
           return {"Hits@1": hit_at_1 / len(eval_preds.predictions), "Hits@10": hit_at_10 / len(eval_preds.predictions)}

Did the results from Table 1 of your paper come directly from this function? Or have you also multiplied by the size of the evaluation set?

Which test dataset did you use to generate these results?

ArvinZhuang · 2023-05-23T10:53:40Z

Ah, I see what you mean, yes my results are directly from this function, the numbers in the table are simply percentages, which you just need to multiply the numbers from the function by 100. @g-lopes
So I think you have successfully reproduced my results :)

gcalabria · 2023-05-23T11:03:01Z

Aha! Great! Thank you very much :D

ArvinZhuang closed this as completed May 30, 2023

hi-i-m-GTooth mentioned this issue Mar 3, 2024

Reproducibility Problem #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question when reproducing experiment #10

Question when reproducing experiment #10

gcalabria commented May 10, 2023

ArvinZhuang commented May 10, 2023

gcalabria commented May 10, 2023

ArvinZhuang commented May 10, 2023

gcalabria commented May 10, 2023

gcalabria commented May 23, 2023 •

edited

Loading

ArvinZhuang commented May 23, 2023

gcalabria commented May 23, 2023

ArvinZhuang commented May 23, 2023 •

edited

Loading

gcalabria commented May 23, 2023

Question when reproducing experiment #10

Question when reproducing experiment #10

Comments

gcalabria commented May 10, 2023

ArvinZhuang commented May 10, 2023

gcalabria commented May 10, 2023

ArvinZhuang commented May 10, 2023

gcalabria commented May 10, 2023

gcalabria commented May 23, 2023 • edited Loading

ArvinZhuang commented May 23, 2023

gcalabria commented May 23, 2023

ArvinZhuang commented May 23, 2023 • edited Loading

gcalabria commented May 23, 2023

gcalabria commented May 23, 2023 •

edited

Loading

ArvinZhuang commented May 23, 2023 •

edited

Loading