The main differences to the original trec_eval:
-
Add
-p
flag to match query variations. Query301-1-1
can match301
in qrels. -
Use hashmap for performance improvement.
-
Accept stdin (specify
-
) as run file input. -
Support Anserini 3-field format run file.
-
Support
recip_rank_cut
. -
Support large run files (MSMARCO train run ~8.9G, 500m lines. Finished 12.6m).
$ wc -l qrels.train.tsv queries.train.depth100.trec_run t5.train.queries.prf.ans_run
484177 qrels.train.tsv
48413347 queries.train.depth100.trec_run
106917898 t5.train.queries.prf.ans_run
$ trec_eval --version
trec_eval version 9.0.7
- 49m37s
$ time trec_eval -m recip_rank qrels.train.tsv queries.train.depth100.trec_run > /dev/null
2970.13user 2.61system 49:37.80elapsed 99%CPU (0avgtext+0avgdata 4588976maxresident)k
219368inputs+0outputs (17major+1372861minor)pagefaults 0swaps
$ trec_eval --version
trec_eval version 9.0.7-lbs-20210321
- 1m5s
$ time trec_eval -m recip_rank qrels.train.tsv queries.train.depth100.trec_run > /dev/null
62.88user 2.49system 1:05.82elapsed 99%CPU (0avgtext+0avgdata 4797656maxresident)k
4576208inputs+0outputs (6major+1485574minor)pagefaults 0swaps
- 2m26s
$ time trec_eval -p -m recip_rank qrels.train.tsv t5.train.queries.prf.ans_run > /dev/null
130.42user 15.74system 2:26.20elapsed 99%CPU (0avgtext+0avgdata 8319388maxresident)k
0inputs+0outputs (0major+8079924minor)pagefaults 0swaps