Issues of running MXNET cu-80 with RTX2080, Nvidia-driver-410, CUDA-8.0, 9.0, 9.2 #13337

jacksonxliu · 2018-11-20T12:27:48Z

jacksonxliu
Nov 20, 2018

Description

Loading MXNET mtcnn model very slow with CUDA8.0.
With CUDA 9.0 or CUDA 9.2, fail to load MXNET forward convolution algorithm, or after processing the forward convolution algorithm, the system would show error if I want to give further instructions, such as print mxnet.ndarray or asNumpy

Environment info (Required)

Ubuntu 16.04
Nvidia-driver410
GrForce RTX 2080
CUDA 8.0, CUDA 9.0, CUDA 9.2
mxnet cu-80, mxnet cu-90, mxnet-cu92
cudnn v6.0 for CUDA 8.0, cudnn v7.0 for CUDA 9.0, CUDA 9.2

I installed Nvidia-driver-410 first,

Package used (Python/R/Scala/Julia):
python 2.7


Compiler (gcc/clang/mingw/visual studio):
visual studio code




## Error Message:
(Paste the complete error message, including stack trace.)
Segmentation fault: 11
mxnet.base.MXNetError: [20:11:38] src/operator/nn/./cudnn/cudnn_convolution-inl.h:870: Failed to find any forward convolution algorithm.



## What have you tried to solve it?

1. We tried to configure the environment in multiple ways, including install Nvida-410, then install CUDA from .run file, selected not to install the suggested driver version, tried 3 different CUDA versions.

vdantu · 2018-11-20T17:28:44Z

vdantu
Nov 20, 2018

@mxnet-label-bot add [CUDA, C++, question]

0 replies

SergazyK · 2018-11-21T05:36:48Z

SergazyK
Nov 21, 2018

Same for me

0 replies

sbodenstein · 2018-11-21T21:43:16Z

sbodenstein
Nov 21, 2018

Support for Turing GPUs (eg 2080) was added in CUDA 10 (https://devblogs.nvidia.com/cuda-10-features-revealed/). CUDA 9 won't allow you to use Turing GPUs. Also relevant: #12877

0 replies

KellenSunderland · 2018-11-23T20:17:15Z

KellenSunderland
Nov 23, 2018
Collaborator

It shouldn't crash if you use Turing, it should just use a possibly less than ideal PTX version. This could be related to a recently fixed bug. Can you see if you can reproduce the issue in version 1.3.1 (which should be released very soon).

0 replies

sbodenstein · 2018-11-27T11:30:51Z

sbodenstein
Nov 27, 2018

@KellenSunderland: are you sure about this? Both PyTorch and TensorFlow have a lot of issues open about this (eg this), and articles written about it (eg this "The major issue being CUDA-9 is not aware of the latest GeForce RTX 2080 and its architecture hence when you run a program with TF as backend it shows segmentation fault."). I don't think it merely has non-ideal PTX.

0 replies

tianweiy · 2018-11-30T16:22:18Z

tianweiy
Nov 30, 2018

I also encountered the same error one month ago after the forward pass. However, I reinstall the mxnet today and it works fine for me. I also test that mxnet-cuda100 also works now.

My System setup:
ubuntu 18.04
cuda 9.2, cuda 9.1, cuda 10.0 installed
cudnn v7.3.1 for linux

GPU: RTX 2080
CPU is AMD Ryzen 2600
RAM: 16GB

0 replies

KellenSunderland · 2018-11-30T17:42:53Z

KellenSunderland
Nov 30, 2018
Collaborator

@sbodenstein I'm pretty sure. I can test on my machine to trust-but-verify :-).

@tianweiy: that's really interesting that a reinstall fixed your issue. I wonder if people running into this issue could delete their ~/.nv folder and see if it magically starts working. What I'm wondering is did people upgrade their card on the same machine? If so I could see a cubin for the wrong hardware being cached in ~/.nv . More or less a wild guess, but this could cause issues.

Edit: also very possible this was fixed by the cudnn auto-tuning fix if a re-install helped.

0 replies

tianweiy · 2018-11-30T18:19:15Z

tianweiy
Nov 30, 2018

I am upgrading on a new machine so that I don't think the cache is the problem. I also upgrade the cudnn library which may help to fix the problem.

0 replies

leleamol · 2019-01-04T23:06:01Z

leleamol
Jan 4, 2019

The issue doesn't seem to be related to C++ Frontend API, hence removing the C++ label.

0 replies

leleamol · 2019-01-04T23:06:40Z

leleamol
Jan 4, 2019

@mxnet-label-bot update [Question, CUDA]

0 replies

ymm4739 · 2019-01-14T07:19:04Z

ymm4739
Jan 14, 2019

Same for me. I reinstalled mxnet-1.3.1 and did not export MXNET_CUDNN_AUTOTUNE_DEFAULT=0, it worked to me first. But when i run it in the second time, it did not work!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues of running MXNET cu-80 with RTX2080, Nvidia-driver-410, CUDA-8.0, 9.0, 9.2 #13337

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Issues of running MXNET cu-80 with RTX2080, Nvidia-driver-410, CUDA-8.0, 9.0, 9.2 #13337

Description

Environment info (Required)

Replies: 11 comments

KellenSunderland Nov 23, 2018 Collaborator

KellenSunderland Nov 30, 2018 Collaborator

KellenSunderland
Nov 23, 2018
Collaborator

KellenSunderland
Nov 30, 2018
Collaborator