The loss suddenly becomes NAN and the model score becomes 0 during the training process #87

ZZX10082 · 2024-10-25T03:47:55Z

Thank you very much for your work. In the process of reproducing your work, I first used a single 4090GPU to train on the Kitti dataset for 25 epochs to ensure that the established training environment could run normally. Afterwards, I replaced the dataset with the Nuscenes dataset. Although the performance decreased after training for 25 epochs, it was still within an acceptable range. However, during the retraining process, it was found that my a1 parameter in Wanb changed to 0 during evaluation from the fourth to the fifth epoch, and when I retrained on the Kitti dataset, the same situation still occurred. May I ask if you have encountered any relevant situations and if you can give me some advice. @shariqfarooq123

ZZX10082 · 2024-10-27T06:11:11Z

I referred to the method in the issue you replied to and modified the loss so that it cannot be 0, but it still doesn't work,Can you give me some advice？ @shariqfarooq123

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The loss suddenly becomes NAN and the model score becomes 0 during the training process #87

The loss suddenly becomes NAN and the model score becomes 0 during the training process #87

ZZX10082 commented Oct 25, 2024 •

edited

Loading

ZZX10082 commented Oct 27, 2024

The loss suddenly becomes NAN and the model score becomes 0 during the training process #87

The loss suddenly becomes NAN and the model score becomes 0 during the training process #87

Comments

ZZX10082 commented Oct 25, 2024 • edited Loading

ZZX10082 commented Oct 27, 2024

ZZX10082 commented Oct 25, 2024 •

edited

Loading