Skip to content

[BraTS 2021/PyTorch] Model not properly training #1304

Open
@DanielNajarian

Description

@DanielNajarian

When running the BraTS 2021 notebook (located at PyTorch/Segmentation/nnUNet/notebooks/BraTS21.ipynb) training section, the model is not properly training even though it is going through the steps, as seen in the image below. The Dice is stuck at an extremely low value and neither that nor the loss changes at all over the epochs. The "DALI iterator does not support resetting while epoch is not finished" warning comes up on every epoch but that is not something that I have touched.

image

To Reproduce
Steps to reproduce the behavior:

  1. Clone the DeepLearningExamples repo and Install the dependencies
  2. Download the BraTS 2021 dataset
  3. Change paths in the BraTS 2021 notebook to point to file locations
  4. Run all of the steps up to and including the training stage

Expected behavior
I expected the model to train and have at least a Dice of 70 after 5 epochs

Environment
Please provide at least:

  • PyTorch version: 1.13.1+cu116
  • GPUs in the system: 2x Tesla V100-SXM2-16GB:
  • CUDA driver version 515.86.01:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions