Error with nondet_multi_threaded_augmenter.py #1812

A-W-Git · 2023-11-21T17:21:34Z

A-W-Git
Nov 21, 2023

Hi, I tried to run training but ran into the following error:

Using device: cuda:0

#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################


This is the configuration used by this training:
Configuration name: 3d_lowres
 {'data_identifier': 'nnUNetPlans_3d_lowres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [80, 192, 160], 'median_image_size_in_voxels': [126, 275, 275], 'spacing': [1.1161767430256981, 0.6576431982019902, 0.6576431982019902], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'num_pool_per_axis': [4, 5, 5], 'pool_op_kernel_sizes': [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [1, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'unet_max_num_features': 320, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': False, 'next_stage': '3d_cascade_fullres'} 

These are the global plan.json settings:
 {'dataset_name': 'Dataset111_lv', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [0.6, 0.353515625, 0.353515625], 'original_median_shape_after_transp': [230, 512, 512], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 2150.0, 'mean': 112.53475952148438, 'median': 105.0, 'min': -2048.0, 'percentile_00_5': -41.0, 'percentile_99_5': 388.0, 'std': 65.23908233642578}}} 

2023-11-21 17:10:10.291010: unpacking dataset...
2023-11-21 17:10:12.882671: unpacking done...
2023-11-21 17:10:12.883862: do_dummy_2d_data_aug: False
2023-11-21 17:10:12.884239: Using splits from existing split file: /home/app/output/preprocessed_data/Dataset111_lv/splits_final.json
2023-11-21 17:10:12.884593: The split file contains 5 splits.
2023-11-21 17:10:12.884640: Desired fold for training: 0
2023-11-21 17:10:12.884670: This split has 27 training and 7 validation cases.
2023-11-21 17:10:12.896909: Unable to plot network architecture:
2023-11-21 17:10:12.896971: No module named 'hiddenlayer'
2023-11-21 17:10:12.910495: 
2023-11-21 17:10:12.910562: Epoch 0
2023-11-21 17:10:12.910645: Current learning rate: 0.01
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/opt/conda/envs/nnunet/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/opt/conda/envs/nnunet/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
    raise e
  File "/opt/conda/envs/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
using pin_memory on device 0
Traceback (most recent call last):
  File "/opt/conda/envs/nnunet/bin/nnUNetv2_train", line 8, in <module>
    sys.exit(run_training_entry())
  File "/home/app/nnUNet/nnunetv2/run/run_training.py", line 268, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "/home/app/nnUNet/nnunetv2/run/run_training.py", line 204, in run_training
    nnunet_trainer.run_training()
  File "/home/app/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1242, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
  File "/opt/conda/envs/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in __next__
    item = self.__get_next_item()
  File "/opt/conda/envs/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I am running WSL2 on windows, with 120GB memory, 30GB swap and cuda RTX3090 GPU. My preprocessing worked, and my dataset only contains 34 nrrd files for image and label. Could you please help me solve this issue? Thanks!

Also, I tried setting worker number to 1 by using export nnUNet_n_proc_DA=1, which didn't solve the error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with nondet_multi_threaded_augmenter.py #1812

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Error with nondet_multi_threaded_augmenter.py #1812

A-W-Git Nov 21, 2023

Replies: 0 comments

A-W-Git
Nov 21, 2023