Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncnn::Convolution_arm::forward运行时crash #1601

Closed
jiyinghui39 opened this issue Mar 10, 2020 · 21 comments
Closed

ncnn::Convolution_arm::forward运行时crash #1601

jiyinghui39 opened this issue Mar 10, 2020 · 21 comments

Comments

@jiyinghui39
Copy link

做目标检测时候遇到的。在运行1000次以后后才会发生。使用的ncnn版本号是:c089ddb3c235033f8da52b5d22639df5b0ff0824。
要解决该问题还需要提供哪些信息?

@nihui
Copy link
Member

nihui commented Mar 10, 2020

如果把你的模型,放在 benchncnn 里面跑 1000 次以上,会遇到 crash 吗?

@jiyinghui39
Copy link
Author

我测试一下,有结果后反馈。感谢及时回复。

@jiyinghui39
Copy link
Author

在benchmark下复现不了。但是最近这个问题发生的次数越来越多了。我们测试确认,将opt.use_paking_layout设置为ture就可以规避。但是由于多占了10M左右的内存,无法接受。

@jiyinghui39
Copy link
Author

01-02 11:55:25.319 29428 29428 I AEE_AEDV:
01-02 11:55:25.319 29428 29428 I AEE_AEDV: backtrace:
01-02 11:55:25.319 29428 29428 I AEE_AEDV: #00 pc 0001a49e /system/lib/libc.so (abort+63)
01-02 11:55:25.319 29428 29428 I AEE_AEDV: #1 pc 00002443 /system/bin/app_process32 (art::SignalChain::Handler(int, siginfo*, void*)+742)
01-02 11:55:25.319 29428 29428 I AEE_AEDV: #2 pc 00018a28 /system/lib/libc.so
01-02 11:55:25.320 29428 29428 I AEE_AEDV: #3 pc 000bbd52 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.320 29428 29428 I AEE_AEDV: #4 pc 001e30f3 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so (__kmp_invoke_microtask+390)
01-02 11:55:25.320 29428 29428 I AEE_AEDV: #5 pc 001ccca9 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so (__kmp_fork_call+2476)
01-02 11:55:25.320 29428 29428 I AEE_AEDV: #6 pc 001bac2f /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so (__kmpc_fork_call+58)
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #7 pc 000ab225 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #8 pc 0009d373 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #9 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #10 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #11 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #12 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.321 29428 29428 I AEE_AEDV: #13 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #14 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #15 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #16 pc 0009caf5 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #17 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #18 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #19 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.322 29428 29428 I AEE_AEDV: #20 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #21 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #22 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #23 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #24 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #25 pc 0009caf5 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #26 pc 0009caf5 /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.323 29428 29428 I AEE_AEDV: #27 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.324 29428 29428 I AEE_AEDV: #28 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.324 29428 29428 I AEE_AEDV: #29 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
01-02 11:55:25.324 29428 29428 I AEE_AEDV: #30 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libduerVision.so
001e30f3000ab2250009ce1d0009ce1d01-02 11:55:25.324 29428 29428 I AEE_AEDV: #31 pc 0009ce1d /data/app/com.baidu.duer.duerfacedemo-FIirTCL00O1VnbW1BVJvqQ==/lib/arm/libdue

@jiyinghui39
Copy link
Author

jiyinghui@zhuwenying-linux-pc:/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ /android-ndk-r15c/toolchains/arm-linux-an
droideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 000bbd52 001e30f3 001ccca9
.omp_outlined..55
/home/jiyinghui/baidu/ncnn_new/ncnn_new/src/layer/arm/convolution_1x1.h:2779
__kmp_invoke_microtask
external/openmp_llvm/runtime/src/z_Linux_util.cpp:?
__kmp_fork_call
libgcc2.c:?
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ /android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 000bbd52
.omp_outlined..55
/home/jiyinghui/baidu/ncnn_new/ncnn_new/src/layer/arm/convolution_1x1.h:2779
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ /android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 001e30f3
__kmp_invoke_microtask
external/openmp_llvm/runtime/src/z_Linux_util.cpp:?
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ 001ccca9
001ccca9: command not found
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ /android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 001ccca9
__kmp_fork_call
external/openmp_llvm/runtime/src/kmp_runtime.cpp:?
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ /android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 000ab225
ncnn::conv1x1s1_neon(ncnn::Mat const&, ncnn::Mat&, ncnn::Mat const&, ncnn::Mat const&, ncnn::Option const&)
/home/jiyinghui/baidu/ncnn_new/ncnn_new/src/layer/arm/convolution_1x1.h:2706
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ /android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 0009d373
ncnn::Net::forward_layer(int, std::__ndk1::vector<ncnn::Mat, std::__ndk1::allocatorncnn::Mat >&, ncnn::Option&) const
/home/jiyinghui/baidu/ncnn_new/ncnn_new/src/net.cpp:1158
jiyinghui@zhuwenying-linux-pc:
/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a$ ~/android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e libduerVision_debug.so 0009ce1d
ncnn::Net::forward_layer(int, std::__ndk1::vector<ncnn::Mat, std::__ndk1::allocatorncnn::Mat >&, ncnn::Option&) const

@jiyinghui39
Copy link
Author

这是我们的dump信息。看起来还是conv1x1_s1_neon函数内部的crash。看下是否可以依据这些信息定位呢。

@jiyinghui39
Copy link
Author

conv1x1_s1_neon,请问下这个卷积操作,都可以使用那些操作替换。目前,opt.use_paking_layout=true后可以规避。但是由于内存多占用超过10MB了。

@nihui
Copy link
Member

nihui commented Apr 10, 2020

是否有使用内存池?

另外,可以尝试 fastMalloc 的地方,额外多申请一些内存,比如 size + 16 看看

static inline void* fastMalloc(size_t size)

@jiyinghui39
Copy link
Author

没有使用内存池。好的,我试一下。

@jiyinghui39
Copy link
Author

static inline void* fastMalloc(size_t size)
{
size = size + 16;
......
}
@nihui 确认下我这样改可以吗?

@jiyinghui39
Copy link
Author

c089ddb。我是用的ncnn是这个提交。我看好像fasuMalloc里面都已经做了align处理了?

@nihui
Copy link
Member

nihui commented Apr 10, 2020

static inline void* fastMalloc(size_t size)
{
size = size + 16;
......
}
@nihui 确认下我这样改可以吗?

可以,就这样

@jiyinghui39
Copy link
Author

static inline void* fastMalloc(size_t size) { #if _MSC_VER return _aligned_malloc(size, MALLOC_ALIGN); #elif _POSIX_C_SOURCE >= 200112L || (__ANDROID__ && __ANDROID_API__ >= 17) void* ptr = 0; if (posix_memalign(&ptr, MALLOC_ALIGN, size)) ptr = 0; return ptr; #elif __ANDROID__ && __ANDROID_API__ < 17 return memalign(MALLOC_ALIGN, size); #else unsigned char* udata = (unsigned char*)malloc(size + sizeof(void*) + MALLOC_ALIGN); if (!udata) return 0; unsigned char** adata = alignPtr((unsigned char**)udata + 1, MALLOC_ALIGN); adata[-1] = udata; return adata; #endif }

如上,我们是用个android-NDK是r15c,指定的Android-PLATFORM=android-15。这个编译的时候,会选择memalign(MALLOC_ALIGN, size); 我看着分支是已经16byte对齐的内存了。@nihui

@nihui
Copy link
Member

nihui commented Apr 13, 2020

和对齐没关系,就是为了多分配些

@jiyinghui39
Copy link
Author

了解了。感谢。我们尽快压测的试一下。

@jiyinghui39
Copy link
Author

2681224-01-18 10:28:03.204 22929 22929 I AEE_AEDV:
2681225-01-18 10:28:03.204 22929 22929 I AEE_AEDV: backtrace:
2681226-01-18 10:28:03.204 22929 22929 I AEE_AEDV: #00 pc 0001a3de /system/lib/libc.so (abort+63)
2681227-01-18 10:28:03.205 22929 22929 I AEE_AEDV: #1 pc 00002443 /system/bin/app_process32 (art::S ignalChain::Handler(int, siginfo*, void*)+742)
2681228-01-18 10:28:03.205 22929 22929 I AEE_AEDV: #2 pc 00018974 /system/lib/libc.so
2681229-01-18 10:28:03.205 22929 22929 I AEE_AEDV: #3 pc 000ba940 /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681230-01-18 10:28:03.205 22929 22929 I AEE_AEDV: #4 pc 001da2b3 /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so (__kmp_invoke_microtask+390)
2681231-01-18 10:28:03.205 22929 22929 I AEE_AEDV: #5 pc 001c3e69 /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so (__kmp_fork_call+2476)
2681232-01-18 10:28:03.206 22929 22929 I AEE_AEDV: #6 pc 001b1def /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so (__kmpc_fork_call+58)
2681233-01-18 10:28:03.206 22929 22929 I AEE_AEDV: #7 pc 000aaa0f /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681234-01-18 10:28:03.206 22929 22929 I AEE_AEDV: #8 pc 0009d363 /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681235-01-18 10:28:03.206 22929 22929 I AEE_AEDV: #9 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681236-01-18 10:28:03.206 22929 22929 I AEE_AEDV: #10 pc 0009cae5 /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681237-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #11 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681238-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #12 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681239-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #13 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681240-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #14 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681241-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #15 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681242-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #16 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681243-01-18 10:28:03.207 22929 22929 I AEE_AEDV: #17 pc 0009ce0d /data/app/com.baidu.duer.duerface demo-K9xiKIzQ57TPH81kaQJwAA==/lib/arm/libduerVision.so
2681244-01-18 10:28:03.208 22929 22929 I AEE_AEDV: #18 pc 0009ce0d /data/app/com.baidu.duer.duerface

@jiyinghui39
Copy link
Author

jiyinghui@zhuwenying-linux-pc:~$ ~/android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e /vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a/libduerVision_debug.so 000ba940 001da2b3 001c3e69 000aaa0f
.omp_outlined..55
libgcc2.c:?
__kmp_invoke_microtask
libgcc2.c:?
__kmp_fork_call
libgcc2.c:?
ncnn::Convolution_arm::forward(ncnn::Mat const&, ncnn::Mat&, ncnn::Option const&) const
libgcc2.c:?
jiyinghui@zhuwenying-linux-pc:
$ ~/android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-addr2line -f -C -e ~/vision_cpp/snowboy-vision-sdk/Cpp/build/output/armeabi-v7a/libduerVision_debug.so 0009d363 0009ce0d
ncnn::Net::forward_layer(int, std::__ndk1::vector<ncnn::Mat, std::__ndk1::allocatorncnn::Mat >&, ncnn::Option&) const
libgcc2.c:?
ncnn::Net::forward_layer(int, std::__ndk1::vector<ncnn::Mat, std::__ndk1::allocatorncnn::Mat >&, ncnn::Option&) const
libgcc2.c:?
更新后fastMalloc的size后,还是crash。

@nihui
Copy link
Member

nihui commented Apr 15, 2020

我想起来了,你用的是 NDK r15c,这个版本 openmp 实现有bug,建议用 r16b 或者更新的 ndk 版本,或者你编译的时候禁用 NCNN_OPENMP ...

@KevinAnnn
Copy link

我们r20b 在copy_cut_border这个方法,经常crash。具体内部在哪一行,还没有检测到。

@nihui
Copy link
Member

nihui commented Apr 15, 2020

我们r20b 在copy_cut_border这个方法,经常crash。具体内部在哪一行,还没有检测到。

这个方法传入的四个数必须大于等于0

@nihui
Copy link
Member

nihui commented Feb 3, 2021

尝试更新ncnn代码,如问题依旧存在,可以 reopen issue 继续交流

@nihui nihui closed this as completed Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants