Selecting the best model when validation set is noisy #41

adrian-dalessandro · 2023-02-04T01:04:04Z

adrian-dalessandro
Feb 4, 2023

When I train models for some task, I sometimes find that within a reasonable budget I cannot remove significant noise in a validation set. Sometimes, a low training loss will correlate with a higher validation loss (this shows up often in small datasets and imbalanced regression problems). This makes selecting the best performing epoch difficult during re-training on both the training and validation set. Simply selecting the best performant epoch doesn't work as it may be "unlucky". In practice, what are actually the best strategies for approaching this situation? I saw model averaging mentioned in the playbook. Early stopping may also be an option. What would be reasonable strategy in this scenario?

varungodbole · 2023-04-29T16:59:52Z

varungodbole
Apr 29, 2023
Maintainer

@jmgilmer might have some thoughts on this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selecting the best model when validation set is noisy #41

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Selecting the best model when validation set is noisy #41

adrian-dalessandro Feb 4, 2023

Replies: 1 comment

varungodbole Apr 29, 2023 Maintainer

adrian-dalessandro
Feb 4, 2023

varungodbole
Apr 29, 2023
Maintainer