-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the effectiveness of the trained sledge-med.p model #45
Comments
In your example ( https://colab.research.google.com/drive/1t5UdW2Jebue1php888ldDll6yG5jQXQQ?usp=sharing ), why do you ignore the pooler layer? And it seems in your toy example, the logits are from the token instead of from the whole sentence. Thanks and I really appreciate your reply. |
Thanks for reporting, I'm looking into this and will get back to you.
That was just a relatively arbitrary design decision. Essentially whether to initialize the ranking score based on the results of the NSP task or not. I'm not sure if anybody has studied whether one way or other is more effective.
I'm not sure what you mean here. It takes the representation from the first |
Thanks for getting back so quickly. I tried some arbitrary text segments and calculate its relevant score with the query 'Is Hydroxycholoroquine effective?', the result is weird. For example, the relevant score of 'dog cat' is -1.3087, even higher than the two sentences in the toy example. |
Weird, maybe there's some problem with the Colab example I tried putting together. But I also suspect that the model isn't so robust to adversarial text like "dog cat" -- it's only trained on in-domain text. The easiest way to reproduce the results is to run the following pipeline in OpenNIR: bash scripts/pipeline.sh config/sledge/ pipeline.test=True It takes about an hour (probably a bit longer if data needs to be downloaded), but I get:
(Actually a bit better in terms of nDCG@10 than what we reported here.) So that's the easiest way to start with reproduction. I'm not totally sure why the transformers demo isn't working. If you like, I could try to provide an example using the PyTerrier integration, which would let you both use the model within Notebooks/Colab and use the OpenNIR internals directly. Let me know. |
Dear Sean, thanks for your reply and your patience. Yes, I would like to use the example using PyTerrier integration. |
Actually, I am not studying typical information retrieval but got inspired by your SledgeZ paper and plan to use a similar zero-shot learning idea on the data in my domain (medical related). So I tried to see whether the sledge-med.p is working well on other medical-related data. |
This should do the trick then! Here's a colab link: https://colab.research.google.com/drive/12EdgWMKbMJxmR8XrLUr74PbASsfI8g6N?usp=sharing And the code: import pandas as pd
import pyterrier as pt
if not pt.started():
pt.init()
import onir_pt
sledgez = onir_pt.reranker.from_checkpoint('https://macavaney.us/files/pt-sledgez.tar.gz')
# Pass in the query/text pairs like so:
sledgez(pd.DataFrame([
{'qid': '0', 'query': 'covid symptoms', 'text': 'SARC-COV2 symptoms include a b and c'},
{'qid': '0', 'query': 'covid symptoms', 'text': 'dog cat'}
]))
# qid query text score
# 0 covid symptoms SARC-COV2 symptoms include a b and c 2.172534
# 0 covid symptoms dog cat -3.007984 Let me know if this works for you. |
Dear Sean MacAvaney,
I loaded the trained model sledge-med.p following the instructions in [https://colab.research.google.com/drive/1t5UdW2Jebue1php888ldDll6yG5jQXQQ?usp=sharing] and tried to reproduce the result on the trec-covid round1 dataset. However, the output seems not over-perform the classic BM25's result. Could you please verify if the uploaded sledge-med.p is effective or the instructions in the shared google doc are correct. Thank you so much!!!
The text was updated successfully, but these errors were encountered: