Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with overfitting #34

Open
yangtcai opened this issue Jun 27, 2022 · 4 comments
Open

Deal with overfitting #34

yangtcai opened this issue Jun 27, 2022 · 4 comments

Comments

@yangtcai
Copy link
Collaborator

Hi, @williamstark01, I test two different hyperparameter to deal with overfitting.The orange one is the first time I trained our model, it's learning rate 0.001, transformer encoder-decorder 6 layers. When I find our model is overfitting, I find reletived issue in DETR, small datasets could lead to this problem, so I change the layers from 6 to 3. The blue one's learning rate is 0.001, transformer encoder-decorder 3 layers. And the red one, I change the dropout from 0.2 to 0.1, and also change the learning rate to 0.0001.
image
image
The related issue link: facebookresearch/detr#342
I think we can add more chromosome datasets into train our model, as COCO datasets are up to 330k, we only have 8.9k to train. Am I understanding is correct?

@williamstark01
Copy link
Collaborator

Nice experimenting and troubleshooting! It looks to me that the learning rate was the major culprit for overfitting at this point. 1e-4 is a good value, very frequently used (and we might add lr decay as well later on). 3 layers for the transformer encoder and decoder also makes sense for now. The dropout should probably be increased since we observe overfitting, 0.3 up to 0.5 are potentially good values to try.

At this stage it would be good to organize how we track multiple training experiments. TensorBoard is a good option, we just need to also save hyperparameter values that will help us filter experiments. Could you share the one you are currently using?

I think at this point it would be worth looking at converting all tunable variables of the network into hyperparameters, for example number of transformer layers, number of attention heads, etc. Those could either be arguments to the training script or an experiment configuration file. New issue: #35

Good work, very encouraging initial results!

@williamstark01
Copy link
Collaborator

Almost forgot:

I think we can add more chromosome datasets into train our model, as COCO datasets are up to 330k, we only have 8.9k to train. Am I understanding is correct?

This is correct, the training set is relatively small at this point. We should get better results if we add more chromosomes in the dataset. New issue: #36

@yangtcai
Copy link
Collaborator Author

At this stage it would be good to organize how we track multiple training experiments. TensorBoard is a good option, we just need to also save hyperparameter values that will help us filter experiments. Could you share the one you are currently using?
Hi, @williamstark01, I'm slightly confused with this part, do you mean the hyperparameters or TenserBoard files? :D

@williamstark01
Copy link
Collaborator

I meant the URL to the TensorBoard dashboard if you are using the official public one ( https://tensorboard.dev/ ) and uploading the log there. Or are you running a TensorBoard locally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants