Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Split #46

Open
lingxitong opened this issue Nov 13, 2024 · 1 comment
Open

Dataset Split #46

lingxitong opened this issue Nov 13, 2024 · 1 comment

Comments

@lingxitong
Copy link

Nice Work! I have some questions about the dataset split of the downstream task.For the train-test split (e.g., CRC-100K dataset), my understanding is that you train for one epoch on the training set, then test once on the test set, and finally report the best result. Alternatively, within the train split, you further divide it into training and validation sets, then use the validation set to select the best model before testing on the test set.Can you tell me you conduct which one?I will follow your answer to process my dataset the same way.Thanks!

@Richarizardd
Copy link
Contributor

Hi @lingxitong - We simply fit a logistic regression model via L-BFGS on the entire train set using a adaptable cost (w.r.t. to the embedding dimension size). We do not divide into train / validation sets (no hyper-parameter tuning or model selection).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants