Nan loss in RCAN model #12

Ken1256 · 2019-03-02T14:52:09Z

https://github.com/wayne391/Image-Super-Resolution/blob/master/src/models/RCAN.py

Just change
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4, amsgrad=False)
to
optimizer = adabound.AdaBound(model.parameters(), lr=1e-4, final_lr=0.1)

Nan loss in RCAN model, but Adam work fine.

The text was updated successfully, but these errors were encountered:

Luolc · 2019-03-03T01:27:40Z

Hi! Thanks for sharing the failure case!
I will try to reproduce the result using your code. Do you know how much resource it needs for training?

Ken1256 · 2019-03-03T14:09:31Z

I find out if torch.nn.L1Loss(reduction='mean') AdaBound work fine, But torch.nn.L1Loss(reduction='sum') Nan loss. (Sorry, after double check the code I change reduction='mean' to reduction='sum', But Adam both work fine. Normally 'mean' or 'sum' should be the same.)

The resource depan img patch_size, set args.n_resgroups = 3 and args.n_resblocks = 2 will much faster and less VRAM.

Luolc · 2019-03-03T14:36:12Z

Thanks for more details.

In this case, I guess AdaBound is a little bit sensitive on RCAN model, and a final_lr of 0.1 is too large. You may try some smaller final_lr like 0.03, 0.01, 0.003, and etc. But I am not familiar with this model and cannot make sure it would work.

Ken1256 · 2019-03-03T14:52:30Z

I try optimizer = adabound.AdaBound(model.parameters(), lr=1e-4, final_lr=1e-4) still Nan loss.

Luolc · 2019-03-03T15:34:38Z

1e-4 might be too small ...

If I understand correctly, the only difference between mean and sum is a scale of N (the count of samples in a step). If AdaBound can work with mean, then reducing the learning rate with a scale of N should work too. But I am not sure whether it should be lr or final_lr or both. I just had a discussion with my schoolmates on a seminar today about which one is more important in training, the early stage or the final stage. However we haven't come out a clear answer yet. So we have to test it through experiments right now.

GreatGBL · 2019-03-04T00:47:59Z

1e-4 might be too small ...

If I understand correctly, the only difference between mean and sum is a scale of N (the count of samples in a step). If AdaBound can work with mean, then reducing the learning rate with a scale of N should work too. But I am not sure whether it should be lr or final_lr or both. I just had a discussion with my schoolmates on a seminar today about which one is more important in training, the early stage or the final stage. However we haven't come out a clear answer yet. So we have to test it through experiments right now.

Not exactly correct, suppose If dataset A has 101 data, and the batchsize is set as 10.
If we setting the reduction as mean, There is no problem with this.
Otherwise, once the last batch which has only one data, and it effect the learning rate

Luolc · 2019-03-04T03:09:40Z

I believe that's a very extreme case. Generally, a single step won't affect the whole training process, on expectation.

In this case, we would encounter a much less gradient once an epoch when using sum. If this does affect the training, I think the dataset is too small and SGD will fail either.

MitraTj · 2019-04-24T10:33:36Z

hi, i use torch version 0.3.1. and just I modified
optimizer = optim.Adam(params, weight_decay=conf.l2, lr=lr, eps=1e-3) to
optimizer = adabound.AdaBound(params, weight_decay=conf.l2, lr=lr, final_lr=0.1, eps=1e-3)

when I ran it I faced raise ImportError("torch.utils.ffi is deprecated).

Would you help?
Thanks

Michael-J98 · 2020-07-04T01:58:17Z

hi, I‘m a beginner, and I have a small question about it：
The adabound was inspired by gradient_clip while clipping happens on the lr rather than the gradient.
So does it mean that I still need to clip the gradient before feeding it into optimizer to prevent the gradient becoming Nan?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan loss in RCAN model #12

Nan loss in RCAN model #12

Ken1256 commented Mar 2, 2019

Luolc commented Mar 3, 2019 •

edited

Loading

Ken1256 commented Mar 3, 2019 •

edited

Loading

Luolc commented Mar 3, 2019

Ken1256 commented Mar 3, 2019

Luolc commented Mar 3, 2019

GreatGBL commented Mar 4, 2019

Luolc commented Mar 4, 2019

MitraTj commented Apr 24, 2019

Michael-J98 commented Jul 4, 2020

Nan loss in RCAN model #12

Nan loss in RCAN model #12

Comments

Ken1256 commented Mar 2, 2019

Luolc commented Mar 3, 2019 • edited Loading

Ken1256 commented Mar 3, 2019 • edited Loading

Luolc commented Mar 3, 2019

Ken1256 commented Mar 3, 2019

Luolc commented Mar 3, 2019

GreatGBL commented Mar 4, 2019

Luolc commented Mar 4, 2019

MitraTj commented Apr 24, 2019

Michael-J98 commented Jul 4, 2020

Luolc commented Mar 3, 2019 •

edited

Loading

Ken1256 commented Mar 3, 2019 •

edited

Loading