-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainloss:Very high volatility in loss #13
Comments
I have the same problem. Have you solved the problem? |
Hi bro. Have you solved the promblm? |
Not yet . Testing with the results of the 10,000th round shows very poor results. It is ok to use the lowest loss before the fluctuation, although the result is not good, probably due to the small size of my dataset, I am trying to adjust the parameters |
Hey,guys. Have you solved the promblm? |
Hey guys. I got the solution. Add a scheduler to control the learning rate. |
@pnaclcu Good job! Can you share your loss code? |
Hi bros, |
we can set the parameter args.scale_lr == False to solve this problem. |
Hello, thanks for your codes. They are elegant and clear. These codes help me a lot.
I got a problem as the training loss performed very well about 0.001 at the beginning of the training.
The default end epoch is set as 10000. But the training loss will get a surprising number about "Training Loss : 325440.0592" at 2000+ epochs. I am curious. Have you ever encountered this issue before?
The training batch size is 96 with 4 GPUs with PyTorch.DDP. Since the full training data set only includes about 4000 images, 4 GPUs only need about 10 iterations to end an epoch. Do you think this is the reason?
Thanks for your codes.
The text was updated successfully, but these errors were encountered: