Document is not clear on 'topk' and 'mean' for lambdarank_pair_method for lambda rank pair construction #10991

nsh-bay · 2024-11-08T19:20:44Z

Hi team,
I have a few questions on this document for learn to rank https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html,

I couldn't find how can I run an exhaustive pairs construction with lambdarank_pair_method='mean' or 'topk'. This is my ultimate goal. Note that the number of documents for each query varies.
What is the default k (lambdarank_num_pair_per_sample) for topk and mean method?
When I left it default, the model.json shows the lambdarank_num_pair_per_sample is full 32bit number (screenshot). Is it a bug?

I assume that setting topk and set the k via lambdarank_num_pair_per_sample very large number (e.g., -1 or 1000) can help me achieve the goal in question 1, but I am not sure how it behaves if lambdarank_num_pair_per_sample is set to a number higher than number of documents for every queries.
The example with the mean method is a bit tricky to me that if we have 3 documents , typically we only need 2c3=3 pairs at most but the example showed we can generate lambdarank_num_pair_per_sample * #documents = 2*3 = 6.
- a. That means there are duplicates pairs in this case? if I set method as mean and lambdarank_num_pair_per_sample is very large, does it affects the training time significantly because of that duplicates?
- b. How to set it to archive question 1 above?

Here is the example quote in the document.
For the mean strategy, XGBoost samples lambdarank_num_pair_per_sample pairs for each document in a query list. For example, given a list of 3 documents and lambdarank_num_pair_per_sample is set to 2, XGBoost will randomly sample 6 pairs, assuming the labels for these documents are different. On the other hand, if the pair method is set to topk, XGBoost constructs about number of pairs with pairs for each sample at the top position. The number of pairs counted here is an approximation since we skip pairs that have the same label.

If I select topk' method with lambdarank_num_pair_per_sample=2` and my query have 4 documents, says ranked d1-d4.
- a. What pairs will be constructed? (d1-d2), (d1d3), (d1-d4), (d2-d3), (d2-d4) ?
- b. The document says it will construct k*|query| , so it should be 2*4=8 , how will they be constructed ?

Here is one of my GBM setting and environment:

xgb.version :2.1.2 (CPU only)
Labels is floating point values

        'ndcg_exp_gain': False,
       'objective': 'rank:ndcg',
       'lambdarank_pair_method':'topk',
       'lambdarank_num_pair_per_sample':10000,
       'verbosity': 1,
       'grow_policy': 'lossguide',
       'learning_rate': 0.3,              
       'max_depth': 6,                    
       'min_child_weight': 0.0,             
       'subsample': 0.5,                  
       'tree_method': 'approx',          
       'max_bin': 256,                    
       'gamma': 0,                        
       'reg_lambda': 1.0,                
       'reg_alpha':0.0,                  
       'max_leaves': 32,                  
       'random_state': 999,               
       'n_jobs': -1

Thank you very much.

The text was updated successfully, but these errors were encountered:

trivialfis · 2024-11-10T22:10:59Z

I couldn't find how can I run an exhaustive pairs

For now, set it to a number larger than existing groups?

What is the default k

1 if random sampling, 32 if top k.

Is it a bug?

it's an internal indicator for "not-set".

The example with the mean method is a bit tricky

Randomly select k documents, and pair them with all other existing documents in the group.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document is not clear on 'topk' and 'mean' for lambdarank_pair_method for lambda rank pair construction #10991

Document is not clear on 'topk' and 'mean' for lambdarank_pair_method for lambda rank pair construction #10991

nsh-bay commented Nov 8, 2024 •

edited

Loading

trivialfis commented Nov 10, 2024

Document is not clear on 'topk' and 'mean' for lambdarank_pair_method for lambda rank pair construction #10991

Document is not clear on 'topk' and 'mean' for lambdarank_pair_method for lambda rank pair construction #10991

Comments

nsh-bay commented Nov 8, 2024 • edited Loading

trivialfis commented Nov 10, 2024

nsh-bay commented Nov 8, 2024 •

edited

Loading