Issues of running MXNET cu-80 with RTX2080, Nvidia-driver-410, CUDA-8.0, 9.0, 9.2 #13337
Replies: 11 comments
-
@mxnet-label-bot add [CUDA, C++, question] |
Beta Was this translation helpful? Give feedback.
-
Same for me |
Beta Was this translation helpful? Give feedback.
-
Support for Turing GPUs (eg 2080) was added in CUDA 10 (https://devblogs.nvidia.com/cuda-10-features-revealed/). CUDA 9 won't allow you to use Turing GPUs. Also relevant: #12877 |
Beta Was this translation helpful? Give feedback.
-
It shouldn't crash if you use Turing, it should just use a possibly less than ideal PTX version. This could be related to a recently fixed bug. Can you see if you can reproduce the issue in version 1.3.1 (which should be released very soon). |
Beta Was this translation helpful? Give feedback.
-
@KellenSunderland: are you sure about this? Both PyTorch and TensorFlow have a lot of issues open about this (eg this), and articles written about it (eg this "The major issue being CUDA-9 is not aware of the latest GeForce RTX 2080 and its architecture hence when you run a program with TF as backend it shows segmentation fault."). I don't think it merely has non-ideal PTX. |
Beta Was this translation helpful? Give feedback.
-
I also encountered the same error one month ago after the forward pass. However, I reinstall the mxnet today and it works fine for me. I also test that mxnet-cuda100 also works now. My System setup: GPU: RTX 2080 |
Beta Was this translation helpful? Give feedback.
-
@sbodenstein I'm pretty sure. I can test on my machine to trust-but-verify :-). @tianweiy: that's really interesting that a reinstall fixed your issue. I wonder if people running into this issue could delete their ~/.nv folder and see if it magically starts working. What I'm wondering is did people upgrade their card on the same machine? If so I could see a cubin for the wrong hardware being cached in ~/.nv . More or less a wild guess, but this could cause issues. Edit: also very possible this was fixed by the cudnn auto-tuning fix if a re-install helped. |
Beta Was this translation helpful? Give feedback.
-
I am upgrading on a new machine so that I don't think the cache is the problem. I also upgrade the cudnn library which may help to fix the problem. |
Beta Was this translation helpful? Give feedback.
-
The issue doesn't seem to be related to C++ Frontend API, hence removing the C++ label. |
Beta Was this translation helpful? Give feedback.
-
@mxnet-label-bot update [Question, CUDA] |
Beta Was this translation helpful? Give feedback.
-
Same for me. I reinstalled mxnet-1.3.1 and did not export MXNET_CUDNN_AUTOTUNE_DEFAULT=0, it worked to me first. But when i run it in the second time, it did not work! |
Beta Was this translation helpful? Give feedback.
-
Description
Loading MXNET mtcnn model very slow with CUDA8.0.
With CUDA 9.0 or CUDA 9.2, fail to load MXNET forward convolution algorithm, or after processing the forward convolution algorithm, the system would show error if I want to give further instructions, such as print mxnet.ndarray or asNumpy
Environment info (Required)
Ubuntu 16.04
Nvidia-driver410
GrForce RTX 2080
CUDA 8.0, CUDA 9.0, CUDA 9.2
mxnet cu-80, mxnet cu-90, mxnet-cu92
cudnn v6.0 for CUDA 8.0, cudnn v7.0 for CUDA 9.0, CUDA 9.2
I installed Nvidia-driver-410 first,
Beta Was this translation helpful? Give feedback.
All reactions