Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning passage ID in addition to passage index #326

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jenhsia
Copy link

@jenhsia jenhsia commented Mar 10, 2024

In the original repo, the index corpus tsv file requires that the pid is an integer, but there may be cases where we want it to use passage id (string) instead of passage index (int). These commits allow pid to be a non-integer and allows easy access of the passage ids after passage ranking.

If we save the passage-index-to-passage-id list (pid_list) in the searcher.collection, then we can use it to easily access passage_id after ranking as follows.

for query_id in ranking.data:
    for (passage_index, rank, score) in ranking.data[query_id]:
        passage_id = searcher.collection.pid_list[passage_index]

@timbmg
Copy link

timbmg commented Apr 19, 2024

Thanks @jenhsia! This is also something that would be very helpful to me. Would be great if one of the maintainers could check this? 😇 @santhnm2 @okhat

@timbmg
Copy link

timbmg commented Apr 23, 2024

BTW, it would also be good to remove the requirement for qids to be integers. @jenhsia, maybe you could amend your PR and also comment in evaluation/loaders.py

qid = int(qid)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants