Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the effectiveness of the trained sledge-med.p model #45

Open
fangguo1 opened this issue Feb 22, 2022 · 7 comments
Open

the effectiveness of the trained sledge-med.p model #45

fangguo1 opened this issue Feb 22, 2022 · 7 comments

Comments

@fangguo1
Copy link

Dear Sean MacAvaney,

I loaded the trained model sledge-med.p following the instructions in [https://colab.research.google.com/drive/1t5UdW2Jebue1php888ldDll6yG5jQXQQ?usp=sharing] and tried to reproduce the result on the trec-covid round1 dataset. However, the output seems not over-perform the classic BM25's result. Could you please verify if the uploaded sledge-med.p is effective or the instructions in the shared google doc are correct. Thank you so much!!!

@fangguo1
Copy link
Author

In your example ( https://colab.research.google.com/drive/1t5UdW2Jebue1php888ldDll6yG5jQXQQ?usp=sharing ), why do you ignore the pooler layer? And it seems in your toy example, the logits are from the token instead of from the whole sentence. Thanks and I really appreciate your reply.

@seanmacavaney
Copy link
Contributor

However, the output seems not over-perform the classic BM25's result. Could you please verify if the uploaded sledge-med.p is effective or the instructions in the shared google doc are correct

Thanks for reporting, I'm looking into this and will get back to you.

why do you ignore the pooler layer

That was just a relatively arbitrary design decision. Essentially whether to initialize the ranking score based on the results of the NSP task or not. I'm not sure if anybody has studied whether one way or other is more effective.

And it seems in your toy example, the logits are from the token instead of from the whole sentence.

I'm not sure what you mean here. It takes the representation from the first [CLS] token, which is the conventional way to represent the whole sequence. See Figure 3(a) here.

@fangguo1
Copy link
Author

Thanks for getting back so quickly. I tried some arbitrary text segments and calculate its relevant score with the query 'Is Hydroxycholoroquine effective?', the result is weird. For example, the relevant score of 'dog cat' is -1.3087, even higher than the two sentences in the toy example.

@seanmacavaney
Copy link
Contributor

Weird, maybe there's some problem with the Colab example I tried putting together. But I also suspect that the model isn't so robust to adversarial text like "dog cat" -- it's only trained on in-domain text.

The easiest way to reproduce the results is to run the following pipeline in OpenNIR:

bash scripts/pipeline.sh config/sledge/ pipeline.test=True

It takes about an hour (probably a bit longer if data needs to be downloaded), but I get:

SLDEGE:
judged@5=0.9800 ndcg@10=0.6917 p@5=0.7867 p_rel-2@5=0.6400
BM25:
judged@5=0.9200 ndcg@10=0.5156 p@5=0.6133 p_rel-2@5=0.4667

(Actually a bit better in terms of nDCG@10 than what we reported here.)

So that's the easiest way to start with reproduction.

I'm not totally sure why the transformers demo isn't working. If you like, I could try to provide an example using the PyTerrier integration, which would let you both use the model within Notebooks/Colab and use the OpenNIR internals directly. Let me know.

@fangguo1
Copy link
Author

Dear Sean, thanks for your reply and your patience. Yes, I would like to use the example using PyTerrier integration.

@fangguo1
Copy link
Author

fangguo1 commented Feb 23, 2022

Actually, I am not studying typical information retrieval but got inspired by your SledgeZ paper and plan to use a similar zero-shot learning idea on the data in my domain (medical related). So I tried to see whether the sledge-med.p is working well on other medical-related data.

@seanmacavaney
Copy link
Contributor

This should do the trick then! Here's a colab link: https://colab.research.google.com/drive/12EdgWMKbMJxmR8XrLUr74PbASsfI8g6N?usp=sharing

And the code:

import pandas as pd
import pyterrier as pt
if not pt.started():
  pt.init()
import onir_pt

sledgez = onir_pt.reranker.from_checkpoint('https://macavaney.us/files/pt-sledgez.tar.gz')

# Pass in the query/text pairs like so:
sledgez(pd.DataFrame([
  {'qid': '0', 'query': 'covid symptoms', 'text': 'SARC-COV2 symptoms include a b and c'},
  {'qid': '0', 'query': 'covid symptoms', 'text': 'dog cat'}
]))
# qid           query                                  text     score
#   0  covid symptoms  SARC-COV2 symptoms include a b and c  2.172534
#   0  covid symptoms                               dog cat -3.007984

Let me know if this works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants