[BraTS 2021/PyTorch] Model not properly training

When running the BraTS 2021 notebook (located at PyTorch/Segmentation/nnUNet/notebooks/BraTS21.ipynb) training section, the model is not properly training even though it is going through the steps, as seen in the image below. The Dice is stuck at an extremely low value and neither that nor the loss changes at all over the epochs. The "DALI iterator does not support resetting while epoch is not finished" warning comes up on every epoch but that is not something that I have touched.

![image](https://github.com/NVIDIA/DeepLearningExamples/assets/133669123/74f3d28c-eaf0-425d-ba6a-bb2a6c7e3d43)


**To Reproduce**
Steps to reproduce the behavior:
1. Clone the DeepLearningExamples repo and Install the dependencies
2. Download the BraTS 2021 dataset
3. Change paths in the BraTS 2021 notebook to point to file locations
4. Run all of the steps up to and including the training stage

**Expected behavior**
I expected the model to train and have at least a Dice of 70 after 5 epochs

**Environment**
Please provide at least:
* PyTorch version: 1.13.1+cu116
* GPUs in the system: 2x Tesla V100-SXM2-16GB:
* CUDA driver version 515.86.01:


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BraTS 2021/PyTorch] Model not properly training #1304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BraTS 2021/PyTorch] Model not properly training #1304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions