You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I couldn't find how can I run an exhaustive pairs construction with lambdarank_pair_method='mean' or 'topk'. This is my ultimate goal. Note that the number of documents for each query varies.
What is the default k (lambdarank_num_pair_per_sample) for topk and mean method?
When I left it default, the model.json shows the lambdarank_num_pair_per_sample is full 32bit number (screenshot). Is it a bug?
I assume that setting topk and set the k via lambdarank_num_pair_per_sample very large number (e.g., -1 or 1000) can help me achieve the goal in question 1, but I am not sure how it behaves if lambdarank_num_pair_per_sample is set to a number higher than number of documents for every queries.
The example with the mean method is a bit tricky to me that if we have 3 documents , typically we only need 2c3=3 pairs at most but the example showed we can generate lambdarank_num_pair_per_sample * #documents = 2*3 = 6.
a. That means there are duplicates pairs in this case? if I set method as mean and lambdarank_num_pair_per_sample is very large, does it affects the training time significantly because of that duplicates?
b. How to set it to archive question 1 above?
Here is the example quote in the document. For the mean strategy, XGBoost samples lambdarank_num_pair_per_sample pairs for each document in a query list. For example, given a list of 3 documents and lambdarank_num_pair_per_sample is set to 2, XGBoost will randomly sample 6 pairs, assuming the labels for these documents are different. On the other hand, if the pair method is set to topk, XGBoost constructs about number of pairs with pairs for each sample at the top position. The number of pairs counted here is an approximation since we skip pairs that have the same label.
If I select topk' method with lambdarank_num_pair_per_sample=2` and my query have 4 documents, says ranked d1-d4.
a. What pairs will be constructed? (d1-d2), (d1d3), (d1-d4), (d2-d3), (d2-d4) ?
b. The document says it will construct k*|query| , so it should be 2*4=8 , how will they be constructed ?
Hi team,
I have a few questions on this document for learn to rank https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html,
k
(lambdarank_num_pair_per_sample
) fortopk
andmean
method?When I left it default, the model.json shows the
lambdarank_num_pair_per_sample
is full 32bit number (screenshot). Is it a bug?topk
and set thek
vialambdarank_num_pair_per_sample
very large number (e.g., -1 or 1000) can help me achieve the goal in question 1, but I am not sure how it behaves iflambdarank_num_pair_per_sample
is set to a number higher than number of documents for every queries.mean
method is a bit tricky to me that if we have 3 documents , typically we only need 2c3=3 pairs at most but the example showed we can generatelambdarank_num_pair_per_sample
*#documents
= 2*3 = 6.mean
andlambdarank_num_pair_per_sample
is very large, does it affects the training time significantly because of that duplicates?For the mean strategy, XGBoost samples lambdarank_num_pair_per_sample pairs for each document in a query list. For example, given a list of 3 documents and lambdarank_num_pair_per_sample is set to 2, XGBoost will randomly sample 6 pairs, assuming the labels for these documents are different. On the other hand, if the pair method is set to topk, XGBoost constructs about number of pairs with pairs for each sample at the top position. The number of pairs counted here is an approximation since we skip pairs that have the same label.
topk' method with
lambdarank_num_pair_per_sample=2` and my query have 4 documents, says ranked d1-d4.k
*|query|
, so it should be2*4=8
, how will they be constructed ?Here is one of my GBM setting and environment:
Thank you very much.
The text was updated successfully, but these errors were encountered: