OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED. #3761
Unanswered
minhduc01168
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
/content/drive/MyDrive/PaddleSeg
2024-07-27 12:17:00 [INFO]
------------Environment Information-------------
platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
Paddle compiled with cuda: True
NVCC: Build cuda_12.2.r12.2/compiler.33191640_0
cudnn: 8.9
GPUs used: 1
CUDA_VISIBLE_DEVICES: None
GPU: ['GPU 0: Tesla T4']
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PaddleSeg: 2.8.0
PaddlePaddle: 2.5.2
OpenCV: 4.5.5
2024-07-27 12:17:00 [INFO]
---------------Config Information---------------
batch_size: 2
iters: 80000
train_dataset:
dataset_root: data/300_wr
mode: train
num_classes: 2
train_path: data/300_wr/train.txt
transforms:
min_scale_factor: 0.5
scale_step_size: 0.25
type: ResizeStepScaling
type: RandomPaddingCrop
contrast_range: 0.4
saturation_range: 0.4
type: RandomDistort
type: Dataset
val_dataset:
dataset_root: data/300_wr
mode: val
num_classes: 2
transforms:
type: Dataset
val_path: data/300_wr/val.txt
optimizer:
momentum: 0.9
type: SGD
weight_decay: 4.0e-05
lr_scheduler:
end_lr: 0
learning_rate: 0.01
power: 0.9
type: PolynomialDecay
loss:
coef:
types:
model:
align_corners: false
aspp_out_channels: 256
aspp_ratios:
backbone:
multi_grid:
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
type: ResNet101_vd
backbone_indices:
num_classes: 2
pretrained: null
type: DeepLabV3P
2024-07-27 12:17:00 [INFO] Set device: gpu
2024-07-27 12:17:00 [INFO] Use the following config to build model
model:
align_corners: false
aspp_out_channels: 256
aspp_ratios:
backbone:
multi_grid:
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
type: ResNet101_vd
backbone_indices:
num_classes: 2
pretrained: null
type: DeepLabV3P
W0727 12:17:00.355322 1437 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.2, Runtime API Version: 11.8
W0727 12:17:00.355347 1437 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
2024-07-27 12:17:00 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
Connecting to https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz
Downloading resnet101_vd_ssld.tar.gz
[==================================================] 100.00%
Uncompress resnet101_vd_ssld.tar.gz
[==================================================] 100.00%
2024-07-27 12:17:14 [INFO] There are 530/530 variables loaded into ResNet_vd.
2024-07-27 12:17:14 [INFO] Use the following config to build train_dataset
train_dataset:
dataset_root: data/300_wr
mode: train
num_classes: 2
train_path: data/300_wr/train.txt
transforms:
min_scale_factor: 0.5
scale_step_size: 0.25
type: ResizeStepScaling
type: RandomPaddingCrop
contrast_range: 0.4
saturation_range: 0.4
type: RandomDistort
type: Dataset
2024-07-27 12:17:16 [INFO] Use the following config to build val_dataset
val_dataset:
dataset_root: data/300_wr
mode: val
num_classes: 2
transforms:
type: Dataset
val_path: data/300_wr/val.txt
2024-07-27 12:17:17 [INFO] If the type is SGD and momentum in optimizer config, the type is changed to Momentum.
2024-07-27 12:17:17 [INFO] Use the following config to build optimizer
optimizer:
momentum: 0.9
type: Momentum
weight_decay: 4.0e-05
2024-07-27 12:17:17 [INFO] Use the following config to build loss
loss:
coef:
types:
/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/norm.py:777: UserWarning: When training, we now always track global mean and variance.
warnings.warn(
Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Error: ../paddle/phi/kernels/gpu/cross_entropy_kernel.cu:1010 Assertion
false
failed. The value of label expected >= 0 and < 2, or == 255, but got 150. Please check label value.Traceback (most recent call last):
File "/content/drive/MyDrive/PaddleSeg/tools/train.py", line 213, in
main(args)
File "/content/drive/MyDrive/PaddleSeg/tools/train.py", line 188, in main
train(
File "/usr/local/lib/python3.10/dist-packages/paddleseg/core/train.py", line 243, in train
loss.backward()
File "", line 2, in backward
File "/usr/local/lib/python3.10/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/fluid/framework.py", line 449, in impl
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/fluid/dygraph/tensor_patch_methods.py", line 298, in backward
core.eager.run_backward([self], grad_tensor, retain_graph)
OSError: (External) CUDNN error(8), CUDNN_STATUS_EXECUTION_FAILED.
[Hint: 'CUDNN_STATUS_EXECUTION_FAILED'. The GPU program failed to execute. This is usually caused by a failure to launch some cuDNN kernel on the GPU, which can occur for multiple reasons. To correct, check that the hardware, an appropriate version of the driver, and the cuDNN library are correctly installed. Otherwise, this may indicate an internal error/bug in the library. ] (at ../paddle/phi/kernels/gpudnn/conv_cudnn_v7.h:848)
I got the following error when running the code on my dataset. Can anyone help me?
Beta Was this translation helpful? Give feedback.
All reactions