Dataset Split #46

lingxitong · 2024-11-13T08:08:39Z

Nice Work! I have some questions about the dataset split of the downstream task.For the train-test split (e.g., CRC-100K dataset), my understanding is that you train for one epoch on the training set, then test once on the test set, and finally report the best result. Alternatively, within the train split, you further divide it into training and validation sets, then use the validation set to select the best model before testing on the test set.Can you tell me you conduct which one?I will follow your answer to process my dataset the same way.Thanks!

Richarizardd · 2024-11-21T21:09:35Z

Hi @lingxitong - We simply fit a logistic regression model via L-BFGS on the entire train set using a adaptable cost (w.r.t. to the embedding dimension size). We do not divide into train / validation sets (no hyper-parameter tuning or model selection).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Split #46

Dataset Split #46

lingxitong commented Nov 13, 2024

Richarizardd commented Nov 21, 2024

Dataset Split #46

Dataset Split #46

Comments

lingxitong commented Nov 13, 2024

Richarizardd commented Nov 21, 2024