Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] WASM build of v1.20.1 with --use_xnnpack fails #23460

Open
raphaelmenges opened this issue Jan 22, 2025 · 10 comments
Open

[Build] WASM build of v1.20.1 with --use_xnnpack fails #23460

raphaelmenges opened this issue Jan 22, 2025 · 10 comments
Labels
build build issues; typically submitted using template contributions welcome external contributions welcome ep:Xnnpack issues related to XNNPACK EP platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@raphaelmenges
Copy link

Describe the issue

Hello 👋

I want to utilize the ONNX runtime in a Web application. I successfully built the static library of ONNX runtime with SIMD and threading and linked it within my Emscripten project. The CPU execution providers works great. Now I tried to also include the XNNPACK execution provider with the --use_xnnpack flag to improve the performance of the inferencing.

However, the src/amalgam directory is missing from the XNNPACK distribution. xnnpack.cmake says:

# See source lists in _deps/googlexnnpack-src/BUILD.bazel for wasm_prod_microkernels

I cannot find any documentation at the XNNPACK how to manually generate these amalgam (?) microkernels. I also cannot find an example in this repository. How to build the XNNPACK execution provider for the Web?

Urgency

No response

Target platform

WASM

Build script

onnxruntime/build.sh \
    --build_dir ./onnxruntime-build \
    --config Release \
    --build_wasm_static_lib \
    --enable_wasm_simd \
    --enable_wasm_threads \
    --skip_tests \
    --disable_wasm_exception_catching \
    --disable_rtti \
    --use_xnnpack \
    --parallel

Error / output

CMake Error at external/xnnpack.cmake:189 (target_sources):
  Cannot find source file:

    ./onnxruntime-build/Release/_deps/googlexnnpack-src/src/amalgam/gen/scalar.c

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm
  .ccm .cxxm .c++m .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90
  .f95 .f03 .hip .ispc
Call Stack (most recent call first):
  external/onnxruntime_external_deps.cmake:566 (include)
  CMakeLists.txt:614 (include)

Visual Studio Version

No response

GCC / Compiler Version

No response

@raphaelmenges raphaelmenges added the build build issues; typically submitted using template label Jan 22, 2025
@github-actions github-actions bot added ep:Xnnpack issues related to XNNPACK EP platform:web issues related to ONNX Runtime web; typically submitted using template labels Jan 22, 2025
@raphaelmenges
Copy link
Author

Apparently there was an update of the XNNPACK dependency in b94ba09 from google/XNNPACK@0da379f to google/XNNPACK@309b75c. The old version of the XNNPACK dependency still contained the src/amalgam directory. As I understand it, this directory could be created once but there is no more such a script.

@snnn snnn added the contributions welcome external contributions welcome label Jan 24, 2025
@raphaelmenges
Copy link
Author

I could make the Release build configuration compile by removing the source files of src/amalgam from ./cmake/external/xnnpack.cmake:

  # kernels
+ # list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/scalar.c)
+ # list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasm.c)
- list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/scalar.c)
- list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasm.c)

  if(onnxruntime_ENABLE_WEBASSEMBLY_SIMD)
+   # list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasmsimd.c)
-   list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasmsimd.c)
    target_compile_options(XNNPACK PRIVATE "-msimd128")
  endif()

However, there are following errors when linking libonnxruntime.a:

[ 75%] Linking CXX executable example.js
em++: warning: -pthread + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 [-Wpthreads-mem-growth]
wasm-ld: error: libonnxruntime.a(xnnpack_execution_provider.cc.o): undefined symbol: pthreadpool_create
wasm-ld: error: libonnxruntime.a(xnnpack_execution_provider.cc.o): undefined symbol: xnn_deinitialize
wasm-ld: error: libonnxruntime.a(xnnpack_execution_provider.cc.o): undefined symbol: pthreadpool_destroy
wasm-ld: error: libonnxruntime.a(xnnpack_execution_provider.cc.o): undefined symbol: xnn_initialize
wasm-ld: error: libonnxruntime.a(xnnpack_execution_provider.cc.o): undefined symbol: xnn_deinitialize
wasm-ld: error: libonnxruntime.a(xnnpack_execution_provider.cc.o): undefined symbol: pthreadpool_destroy
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_delete_weights_cache
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_create_average_pooling2d_nhwc_f32
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_create_average_pooling2d_nhwc_qu8
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_delete_operator
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_delete_operator
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_reshape_average_pooling2d_nhwc_f32
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_reshape_average_pooling2d_nhwc_qu8
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_setup_average_pooling2d_nhwc_f32
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_setup_average_pooling2d_nhwc_qu8
wasm-ld: error: libonnxruntime.a(average_pool.cc.o): undefined symbol: xnn_run_operator
wasm-ld: error: libonnxruntime.a(conv_base.cc.o): undefined symbol: xnn_create_deconvolution2d_nhwc_f32
wasm-ld: error: libonnxruntime.a(conv_base.cc.o): undefined symbol: xnn_create_convolution2d_nhwc_f32
wasm-ld: error: libonnxruntime.a(conv_base.cc.o): undefined symbol: xnn_create_deconvolution2d_nhwc_qs8
wasm-ld: error: libonnxruntime.a(conv_base.cc.o): undefined symbol: xnn_create_convolution2d_nhwc_qs8
wasm-ld: error: too many errors emitted, stopping now (use -error-limit=0 to see all errors)

@raphaelmenges
Copy link
Author

I could reproduce the issue in a workflow on GitHub actions:

-- Configuring done (17.6s)
CMake Error at /home/runner/work/onnxruntime-wasm-builds/onnxruntime-wasm-builds/build/Debug/_deps/googlexnnpack-src/CMakeLists.txt:748 (ADD_LIBRARY):
  Cannot find source file:

    /home/runner/work/onnxruntime-wasm-builds/onnxruntime-wasm-builds/build/Debug/_deps/googlexnnpack-src/src/amalgam/gen/scalar.c

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
  .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc

https://github.com/alfatraining/onnxruntime-wasm-builds/actions/runs/13069862337/job/36468940575#step:11:508

@guschmue
Copy link
Contributor

we did not see much perf gain and use for xnnpack. We think spending some time to optimize wasm for mlas it the better choice but we have not gotten to that yet.

@raphaelmenges
Copy link
Author

Thank you @guschmue for the feedback!

In terms of execution providers for the Web, the choices are a bit awkward. WebGL as the most stable one is now deprecated, the WebGPU one only works on Chromium with developer flags and Firefox Nightly so far and WebNN is not available in most Web browsers. I was counting on XNNPACK to get a better performance especially on Firefox, which is on CPU execution provider 2-3x slower than Chromium with the same task. Do you have any recommendation what to head for?

@guschmue
Copy link
Contributor

guschmue commented Feb 3, 2025

webgpu is enabled on chromium since some time and stable on all platforms (linux it takes a few extra steps to use webgpu but it works well there too).
For Safari it is still under feature flag.
Firefox - yeah, that is a problem. If you use nightly, I have not been able to run lot of models.
For xnnpack we added support for a few ops and would need to add more but the perf gains have been less than we expected. Makes more sense for us to spend some time on mlas - few optimizations should go a long way there.
We have not gotten to it because the focus has been on webgpu, only that one will allow to run more costly models.
But spending some time on wasm optimization for mlas is still on the list.

@raphaelmenges
Copy link
Author

webgpu is enabled on chromium since some time and stable on all platforms (linux it takes a few extra steps to use webgpu but it works well there too).

Cool! However, to my understanding it is not yet possible to compile onnxruntime as static library for the Web with WebGPU support: #23072 (comment)

@sevagh
Copy link
Contributor

sevagh commented Feb 11, 2025

which is on CPU execution provider 2-3x slower than Chromium with the same task

I feel like in my experience I've found Firefox and Chrome (not Chromium but probably similar) perform similarly when using ONNXRuntime, statically compiled with SIMD128 (not pthread though, and not with xnnpack).

Curious what type of neural network is running 2-3x faster on Chromium than Firefox - or do you mean --use_xnnpack is 2-3x faster? Or is it that even if you compile onnxruntime without --use_xnnpack, somehow XNNPack in Chromium is being leveraged to be faster?

@raphaelmenges
Copy link
Author

I have benchmarked Silero VAD and YoloV8 on Firefox 135.0 and Chromium 135.0.7016.0 on my MacBook with M1 Pro and onnruntime v1.20.1 on WASM with SIMD and threading, using the CPU execution provider:

Silero VAD (Firefox)

Execution #1 took 780 us.
Execution #2 took 760 us.
Execution #3 took 599 us.

Silero VAD (Chromium)

Execution #1 took 650 us.
Execution #2 took 465 us.
Execution #3 took 439 us.

YoloV8 (Firefox)

Execution #1 took 15436 ms.
Execution #2 took 15439 ms.
Execution #3 took 15379 ms.

YoloV8 (Chromium)

Execution #1 took 21850 ms.
Execution #2 took 21496 ms.
Execution #3 took 21390 ms.

My previous estimation of 2-3x faster on Chromium than Firefox was indeed exaggerated. It rather looks like small models can be faster on Chromium and bigger models can be faster on Firefox. However, the performance gap to native execution is still quite huge, e.g., Silero VAD takes about 200 us with the same threading setup natively with CPU execution provider on my computer.

As described in the issue so far, I could not build the XNNPACK execution provider for WASM. To my current understanding, there are no execution providers available for WASM/Emscripten environment at the moment.

@sevagh
Copy link
Contributor

sevagh commented Feb 13, 2025

I got interested (I have a WASM-SIMD ORT application and would love any kind of speed boost).

I tried yesterday to experiment with compiling XNNPack for WASM but like you say, rather difficult with a ton of code issues.

Your original diagnosis and workaround seems accurate to me. The way XNNPack generates kernels is different now and ONNXRuntime doesn't need to specify those files that don't exist anymore:

  # kernels
+ # list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/scalar.c)
+ # list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasm.c)
- list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/scalar.c)
- list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasm.c)

  if(onnxruntime_ENABLE_WEBASSEMBLY_SIMD)
+   # list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasmsimd.c)
-   list(APPEND wasm_srcs ${XNNPACK_DIR}/src/amalgam/gen/wasmsimd.c)
    target_compile_options(XNNPACK PRIVATE "-msimd128")
  endif()

However, after that, I got a variety of errors:

  • gsl::narrow undefined (which is simple to replace with just narrow, which I ran into before in the WebNN provider: Replace gsl::narrow with narrow in WebNN code #22733)
  • Much trickier problems about various structs (forward-declared in onnxruntime/core/providers/xnnpack/detail/utils.h) not existing, e.g. ProviderHost, NodeUnit, int64s, etc.

I eventually gave up on XNNPack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template contributions welcome external contributions welcome ep:Xnnpack issues related to XNNPACK EP platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

No branches or pull requests

4 participants