Fine-tuning with Naive Pipeline Parallel: NaN after optimizer step #4

fs4r · 2023-03-12T15:48:02Z

Your model does not seem to be able to calculate the gradients of the layers correctly. When I run finetune_pp.py and print the loss during training, after the first optimizer step, the loss becomes the following:

tensor(nan, device='cuda:1', dtype=torch.float16, grad_fn=)

Can you reproduce this on your machine? Otherwise, would you be willing to share your pip freeze, so that I can try out, if there is a package mismatch?

yysjasmine · 2023-03-17T13:59:45Z

I met the same situation, how did you fix it?

chaoyi-wu · 2023-03-20T04:05:47Z

I met the same situation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning with Naive Pipeline Parallel: NaN after optimizer step #4

Fine-tuning with Naive Pipeline Parallel: NaN after optimizer step #4

fs4r commented Mar 12, 2023

yysjasmine commented Mar 17, 2023

Uh oh!

chaoyi-wu commented Mar 20, 2023 •

edited

Loading

Uh oh!

Fine-tuning with Naive Pipeline Parallel: NaN after optimizer step #4

Fine-tuning with Naive Pipeline Parallel: NaN after optimizer step #4

Comments

fs4r commented Mar 12, 2023

yysjasmine commented Mar 17, 2023

Uh oh!

chaoyi-wu commented Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaoyi-wu commented Mar 20, 2023 •

edited

Loading