the effectiveness of the trained sledge-med.p model #45

fangguo1 · 2022-02-22T02:52:24Z

Dear Sean MacAvaney,

I loaded the trained model sledge-med.p following the instructions in [https://colab.research.google.com/drive/1t5UdW2Jebue1php888ldDll6yG5jQXQQ?usp=sharing] and tried to reproduce the result on the trec-covid round1 dataset. However, the output seems not over-perform the classic BM25's result. Could you please verify if the uploaded sledge-med.p is effective or the instructions in the shared google doc are correct. Thank you so much!!!

fangguo1 · 2022-02-22T07:58:43Z

In your example ( https://colab.research.google.com/drive/1t5UdW2Jebue1php888ldDll6yG5jQXQQ?usp=sharing ), why do you ignore the pooler layer? And it seems in your toy example, the logits are from the token instead of from the whole sentence. Thanks and I really appreciate your reply.

seanmacavaney · 2022-02-22T13:02:20Z

However, the output seems not over-perform the classic BM25's result. Could you please verify if the uploaded sledge-med.p is effective or the instructions in the shared google doc are correct

Thanks for reporting, I'm looking into this and will get back to you.

why do you ignore the pooler layer

That was just a relatively arbitrary design decision. Essentially whether to initialize the ranking score based on the results of the NSP task or not. I'm not sure if anybody has studied whether one way or other is more effective.

And it seems in your toy example, the logits are from the token instead of from the whole sentence.

I'm not sure what you mean here. It takes the representation from the first [CLS] token, which is the conventional way to represent the whole sequence. See Figure 3(a) here.

fangguo1 · 2022-02-22T13:39:00Z

Thanks for getting back so quickly. I tried some arbitrary text segments and calculate its relevant score with the query 'Is Hydroxycholoroquine effective?', the result is weird. For example, the relevant score of 'dog cat' is -1.3087, even higher than the two sentences in the toy example.

seanmacavaney · 2022-02-22T14:39:25Z

Weird, maybe there's some problem with the Colab example I tried putting together. But I also suspect that the model isn't so robust to adversarial text like "dog cat" -- it's only trained on in-domain text.

The easiest way to reproduce the results is to run the following pipeline in OpenNIR:

bash scripts/pipeline.sh config/sledge/ pipeline.test=True

It takes about an hour (probably a bit longer if data needs to be downloaded), but I get:

SLDEGE:
judged@5=0.9800 ndcg@10=0.6917 p@5=0.7867 p_rel-2@5=0.6400
BM25:
judged@5=0.9200 ndcg@10=0.5156 p@5=0.6133 p_rel-2@5=0.4667

(Actually a bit better in terms of nDCG@10 than what we reported here.)

So that's the easiest way to start with reproduction.

I'm not totally sure why the transformers demo isn't working. If you like, I could try to provide an example using the PyTerrier integration, which would let you both use the model within Notebooks/Colab and use the OpenNIR internals directly. Let me know.

fangguo1 · 2022-02-23T00:02:04Z

Dear Sean, thanks for your reply and your patience. Yes, I would like to use the example using PyTerrier integration.

fangguo1 · 2022-02-23T00:04:53Z

Actually, I am not studying typical information retrieval but got inspired by your SledgeZ paper and plan to use a similar zero-shot learning idea on the data in my domain (medical related). So I tried to see whether the sledge-med.p is working well on other medical-related data.

seanmacavaney · 2022-02-23T15:18:29Z

This should do the trick then! Here's a colab link: https://colab.research.google.com/drive/12EdgWMKbMJxmR8XrLUr74PbASsfI8g6N?usp=sharing

And the code:

import pandas as pd
import pyterrier as pt
if not pt.started():
  pt.init()
import onir_pt

sledgez = onir_pt.reranker.from_checkpoint('https://macavaney.us/files/pt-sledgez.tar.gz')

# Pass in the query/text pairs like so:
sledgez(pd.DataFrame([
  {'qid': '0', 'query': 'covid symptoms', 'text': 'SARC-COV2 symptoms include a b and c'},
  {'qid': '0', 'query': 'covid symptoms', 'text': 'dog cat'}
]))
# qid           query                                  text     score
#   0  covid symptoms  SARC-COV2 symptoms include a b and c  2.172534
#   0  covid symptoms                               dog cat -3.007984

Let me know if this works for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the effectiveness of the trained sledge-med.p model #45

the effectiveness of the trained sledge-med.p model #45

fangguo1 commented Feb 22, 2022

fangguo1 commented Feb 22, 2022

seanmacavaney commented Feb 22, 2022

fangguo1 commented Feb 22, 2022

seanmacavaney commented Feb 22, 2022

fangguo1 commented Feb 23, 2022

fangguo1 commented Feb 23, 2022 •

edited

Loading

seanmacavaney commented Feb 23, 2022

the effectiveness of the trained sledge-med.p model #45

the effectiveness of the trained sledge-med.p model #45

Comments

fangguo1 commented Feb 22, 2022

fangguo1 commented Feb 22, 2022

seanmacavaney commented Feb 22, 2022

fangguo1 commented Feb 22, 2022

seanmacavaney commented Feb 22, 2022

fangguo1 commented Feb 23, 2022

fangguo1 commented Feb 23, 2022 • edited Loading

seanmacavaney commented Feb 23, 2022

fangguo1 commented Feb 23, 2022 •

edited

Loading