Fix the CLM performance mismatch between evaluation and manual inference #723
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #719
Goals ⚽
The current CLM strategy is as follows
![image](https://private-user-images.githubusercontent.com/17721108/247515785-76406b3b-2bcf-4eaf-80c4-7abf17be16f4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAxNjIxNDIsIm5iZiI6MTcyMDE2MTg0MiwicGF0aCI6Ii8xNzcyMTEwOC8yNDc1MTU3ODUtNzY0MDZiM2ItMmJjZi00ZWFmLTgwYzQtN2FiZjE3YmUxNmY0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA1VDA2NDQwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU0MDdjN2I1NjVkNTE5ZmMyMDgwZTc3NjEyNDA0MzYyM2Q0ZDAzZDU1YjAzZDdjNDc0NDNjYzkxZTViZTcwZTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.WhIl4sn39PrPjcYyJfZTT3PAaXaiEtxiDZkqKpfoYCo)
The main issues of the current implementation are:
0
-embeddings while during training these positions are replaced with trainable [MASK] embeddings. ==> We should have the same training, evaluation, and inference representation strategy.Implementation Details 🚧
Updated the class
CausalLanguageModeling
to:label_mask
as the padding mask information (to keep information about actual past items).I ran the
t4r_paper_repro
script using 5 days of ecomrees46 dataset and these are the results:Testing Details 🔍
test_mask_only_last_item_for_eval
to get target mask information fromlm.masked_targets
test_sequential_tabular_features_ignore_masking
as the inference mode of CLM is changing the inputs by replacing0
padded positions with [MASK] embeddingsFuture work
0
-embedding to represent padded positions.==> We need to re-run T4Rec paper experiments without [MASK] variable and check how the evaluation results are impacted.