-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with overfitting #34
Comments
Nice experimenting and troubleshooting! It looks to me that the learning rate was the major culprit for overfitting at this point. 1e-4 is a good value, very frequently used (and we might add lr decay as well later on). 3 layers for the transformer encoder and decoder also makes sense for now. The dropout should probably be increased since we observe overfitting, 0.3 up to 0.5 are potentially good values to try. At this stage it would be good to organize how we track multiple training experiments. TensorBoard is a good option, we just need to also save hyperparameter values that will help us filter experiments. Could you share the one you are currently using? I think at this point it would be worth looking at converting all tunable variables of the network into hyperparameters, for example number of transformer layers, number of attention heads, etc. Those could either be arguments to the training script or an experiment configuration file. New issue: #35 Good work, very encouraging initial results! |
Almost forgot:
This is correct, the training set is relatively small at this point. We should get better results if we add more chromosomes in the dataset. New issue: #36 |
|
I meant the URL to the TensorBoard dashboard if you are using the official public one ( https://tensorboard.dev/ ) and uploading the log there. Or are you running a TensorBoard locally? |
Hi, @williamstark01, I test two different hyperparameter to deal with overfitting.The orange one is the first time I trained our model, it's learning rate 0.001, transformer encoder-decorder 6 layers. When I find our model is overfitting, I find reletived issue in DETR, small datasets could lead to this problem, so I change the layers from 6 to 3. The blue one's learning rate is 0.001, transformer encoder-decorder 3 layers. And the red one, I change the dropout from 0.2 to 0.1, and also change the learning rate to 0.0001.


The related issue link: facebookresearch/detr#342
I think we can add more chromosome datasets into train our model, as COCO datasets are up to 330k, we only have 8.9k to train. Am I understanding is correct?
The text was updated successfully, but these errors were encountered: