Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs #23090

Zyrin · 2024-12-12T09:50:41Z

Description

[CPU EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8, (u)int16, uint32 and uint64.
[CPU EP] Implement Neg unary operation for int16
[CUDA EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16

Motivation and Context

This solves #23051

tianleiwu · 2024-12-12T18:23:15Z

This will increase binary size. Is the missing type used in any real model?

Zyrin · 2024-12-12T21:28:00Z

@microsoft-github-policy-service agree company="Cellumation"

Zyrin · 2024-12-12T21:31:31Z

I do not know if any "real" model use these types for these operations.
I tried to use uint8 operations in my model and found that onnxruntime was not supporting them, although the onnx api documentation has support for them. So I just went ahead and implemented all these missing types.
The binary size grows by <0.6% for libonnxruntime.so and <0.3% for libonnxruntime_providers_cuda.so.

onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc

xadupre · 2024-12-17T11:39:34Z

This may increase the binary size. +@scottmckay

tianleiwu · 2024-12-17T19:29:32Z

@Zyrin, please following https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md#linting to format code.

Also need update documents (You can find the updated documents in artifacts of Windows GPU Doc Gen CI Pipeline from Checks).

tianleiwu · 2024-12-17T19:29:52Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

tianleiwu · 2024-12-17T19:29:54Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline

tianleiwu · 2024-12-17T19:29:55Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

azure-pipelines · 2024-12-17T19:30:26Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2024-12-17T19:30:37Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-12-17T19:30:41Z

Azure Pipelines successfully started running 10 pipeline(s).

Zyrin · 2024-12-18T10:05:46Z

I applied the linting fixes. @tianleiwu could you restart the pipelines?

tianleiwu · 2024-12-18T17:47:34Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

tianleiwu · 2024-12-18T17:47:35Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

tianleiwu · 2024-12-18T17:47:37Z

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

azure-pipelines · 2024-12-18T17:48:08Z

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines · 2024-12-18T17:48:10Z

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines · 2024-12-18T17:48:17Z

Azure Pipelines successfully started running 10 pipeline(s).

tianleiwu · 2025-01-06T22:16:32Z

@Zyrin, there are some build pipeline failed. You need update the unit tests to run on cuda and cpu provider only. See some examples in the same test file.

You will also need update operator documents (you can get them from artifacts of Windows GPU Doc Gen CI Pipeline).

Zyrin · 2025-01-08T19:44:41Z

@tianleiwu I assume you want me to only run the tests on the CPU and CUDA EPs like with the following code snipped from element_wise_ops_test.cc:1837:

if (nullptr != DefaultCpuExecutionProvider()) {
  std::vector<std::unique_ptr<IExecutionProvider>> execution_providers;
  execution_providers.push_back(DefaultCpuExecutionProvider());
  test.Run(OpTester::ExpectResult::kExpectSuccess, "", {}, nullptr, &execution_providers);
}
if (nullptr != DefaultCudaExecutionProvider()) {
  std::vector<std::unique_ptr<IExecutionProvider>> execution_providers;
  execution_providers.push_back(DefaultCudaExecutionProvider());
  test.Run(OpTester::ExpectResult::kExpectSuccess, "", {}, nullptr, &execution_providers);
}

Alternatively I could exclude the TensorRT and DNNL EPs, but I do not know if there are EPs that are not tested here, and thus would fail on someone else.

On that account should I only change the failing tests or all the tests I added?

tianleiwu · 2025-01-09T21:47:11Z

@tianleiwu I assume you want me to only run the tests on the CPU and CUDA EPs like with the following code snipped from element_wise_ops_test.cc:1837:

Right. You can follow the code snippet to fix failing tests that is introduced by this.

Zyrin · 2025-01-09T23:03:12Z

I fixed the tests. Is there a way for me to generate the docs, or is the easiest way to generate them to trigger the Windows GPU Doc Gen CI Pipeline?

tianleiwu · 2025-01-18T00:50:44Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

tianleiwu · 2025-01-18T00:50:46Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline

tianleiwu · 2025-01-18T00:50:48Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

azure-pipelines · 2025-01-18T00:51:11Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2025-01-18T00:51:22Z

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines · 2025-01-18T00:51:22Z

Azure Pipelines successfully started running 9 pipeline(s).

tianleiwu · 2025-01-18T00:52:16Z

@Zyrin, I've triggered the pipelines

Zyrin · 2025-01-18T11:36:54Z

@tianleiwu The Linux GPU CI Pipeline fails on the test case ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312 with a segmentation fault. How can I run this test locally to investigate the problem?

I tries to build the project with the flags from the CI. Unfortunately i get errors like this Load model ../models/opset7/test_bvlc_alexnet/model.onnx failed. File doesn't exist. So I'm guessing I am missing the models. Strangely the tests on my machine did not even include the models_zoo tests.

tianleiwu · 2025-01-18T17:42:10Z

@Zyrin, I rerun the failed test to see whether it is transient issue. The updated documents can be downloaded: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1592734&view=artifacts&pathAsName=false&type=publishedArtifacts

…(u)int8, (u)int16, uint32 and uint64 as well as Neg unary operation for int16 on CPU EP and implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16 on CUDA EP

Zyrin · 2025-01-18T19:32:21Z

@tianleiwu I rebased to the current main branche and pushed everything. Hopefully the CI should now run without an issue.

tianleiwu · 2025-01-19T07:36:41Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

tianleiwu · 2025-01-19T07:37:07Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline

azure-pipelines · 2025-01-19T07:37:21Z

Azure Pipelines successfully started running 10 pipeline(s).

azure-pipelines · 2025-01-19T07:37:40Z

Azure Pipelines successfully started running 9 pipeline(s).

tianleiwu · 2025-01-19T07:37:54Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2025-01-19T07:38:18Z

Azure Pipelines successfully started running 7 pipeline(s).

…r CPU and CUDA EPs (#23090) * [CPU EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8, (u)int16, uint32 and uint64. * [CPU EP] Implement Neg unary operation for int16 * [CUDA EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16 ### Motivation and Context This solves #23051

### Description This PR is to update the win-ort-main branch to the tip main branch as of 2025-01-23. ### PR List ddf0d37 [QNN EP] Add LoggingManager::HasDefaultLogger() to provider bridge API (#23467) 05fbbdf [QNN EP] Make QNN EP a shared library (#23120) 1336566 Add custom vcpkg ports (#23456) 2e1173c Update the compile flags for vcpkg packages (#23455) 1f628a9 [Mobile] Add BrowserStack Android MAUI Test (#23383) 009cae0 [js/webgpu] Optimize ConvTranspose (Continue) (#23429) 04a4a69 Use onnx_protobuf.h to suppress some GCC warnings (#23453) 2e3b62b Suppress some strict-aliasing related warnings in WebGPU EP (#23454) b708f9b Bump ruff from 0.9.1 to 0.9.2 (#23427) c0afc66 [WebNN] Remove workarounds for TFLite backend (#23406) 8a821ff Bump vite from 6.0.7 to 6.0.11 in /js/web/test/e2e/exports/testcases/vite-default (#23446) 220c1a2 Make ORT and Dawn use the same protobuf/abseil source code (#23447) b7b5792 Change MacOS-13 to ubuntu on for android-java-api-aar-test.yml. (#23444) 19d0d2a WIP: Dp4MatMulNBits accuracy level 4 matmul for WebGPU EP (#23365) 95b8eff [QNN EP]: Clean up QNN logging resources if an error occurs during initialization (#23435) 626134c Bump clang-format from 19.1.6 to 19.1.7 (#23428) 0cf9753 Fix eigen external deps (#23439) f9440ae Moving RN_CI Android Testing to Linux (#23422) 1aa5902 [QNN EP] workaround for QNN validation bug for Tanh with uint16 quantized output (#23432) 7f5582a Seperate RN andriod and IOS into 2 separated Stages. (#23400) 73deac2 Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs (#23090) 949fe42 Upgrade Java version from react-native/android to Java 17 (#23066) 0892c23 Update Qnn SDK default version to 2.30 (#23411) 94c099b Fix type cast build error (#23423) d633e57 [WebNN EP] Fix AddInitializersToSkip issues (#23354) e988ef0 [QNN EP] Fix regression for MatMul with two quantized/dynamic uint16 inputs (#23419) 7538795 Update onnxruntime binary size checks ci pipeline's docker image (#23405) 6c5ea41 Revert "[QNN EP] Clean up correctly from a partial setup (#23320)" (#23420) e866804 Enable comprehension simplification in ruff rules (#23414) 0a5f1f3 bugfix: string_view of invalid memory (#23417) 4cc38e0 fix crash when first input of BatchNormalization is 1-D (#23387) 0334414 Target py310 and modernize codebase with ruff (#23401) 87341ac [QNN EP] Fix segfault when unregistering HTP shared memory handles (#23402) ### Motivation and Context This update includes the change to make QNN-EP a shared library. --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Yulong Wang <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Changming Sun <[email protected]> Co-authored-by: Peishen Yan <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Hector Li <[email protected]> Co-authored-by: Jian Chen <[email protected]> Co-authored-by: Alexis Tsogias <[email protected]> Co-authored-by: junchao-zhao <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sushraja-msft <[email protected]> Co-authored-by: Wanming Lin <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: Caroline Zhu <[email protected]>

…r CPU and CUDA EPs (#23090) * [CPU EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8, (u)int16, uint32 and uint64. * [CPU EP] Implement Neg unary operation for int16 * [CUDA EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16 ### Motivation and Context This solves #23051

github-advanced-security bot found potential problems Dec 15, 2024

View reviewed changes

onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc Fixed Show fixed Hide fixed

Zyrin force-pushed the main branch 2 times, most recently from 7fcfd3b to 92d0502 Compare December 18, 2024 08:58

Zyrin force-pushed the main branch 2 times, most recently from a264918 to 89e9858 Compare January 9, 2025 22:59

[CPU/CUDA EPs] Implement Add/Sub/Mul/Div element wise operations for …

1845cfb

…(u)int8, (u)int16, uint32 and uint64 as well as Neg unary operation for int16 on CPU EP and implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16 on CUDA EP

Zyrin force-pushed the main branch from 89e9858 to 1845cfb Compare January 18, 2025 19:28

tianleiwu approved these changes Jan 21, 2025

View reviewed changes

tianleiwu merged commit e20b529 into microsoft:main Jan 21, 2025
92 checks passed

ashrit-ms mentioned this pull request Jan 23, 2025

Update win-ort-main to tip main 250123 #23473

Merged

Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs #23090

Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs #23090

Conversation

Zyrin commented Dec 12, 2024

Description

Motivation and Context

tianleiwu commented Dec 12, 2024

Zyrin commented Dec 12, 2024

Zyrin commented Dec 12, 2024 • edited Loading

xadupre commented Dec 17, 2024

tianleiwu commented Dec 17, 2024 • edited Loading

tianleiwu commented Dec 17, 2024

tianleiwu commented Dec 17, 2024

tianleiwu commented Dec 17, 2024

azure-pipelines bot commented Dec 17, 2024

azure-pipelines bot commented Dec 17, 2024

azure-pipelines bot commented Dec 17, 2024

Zyrin commented Dec 18, 2024

tianleiwu commented Dec 18, 2024

tianleiwu commented Dec 18, 2024

tianleiwu commented Dec 18, 2024

azure-pipelines bot commented Dec 18, 2024

azure-pipelines bot commented Dec 18, 2024

azure-pipelines bot commented Dec 18, 2024

tianleiwu commented Jan 6, 2025

Zyrin commented Jan 8, 2025

tianleiwu commented Jan 9, 2025

Zyrin commented Jan 9, 2025

tianleiwu commented Jan 18, 2025

tianleiwu commented Jan 18, 2025

tianleiwu commented Jan 18, 2025

azure-pipelines bot commented Jan 18, 2025

azure-pipelines bot commented Jan 18, 2025

azure-pipelines bot commented Jan 18, 2025

tianleiwu commented Jan 18, 2025

Zyrin commented Jan 18, 2025

tianleiwu commented Jan 18, 2025 • edited Loading

Zyrin commented Jan 18, 2025

tianleiwu commented Jan 19, 2025

tianleiwu commented Jan 19, 2025

azure-pipelines bot commented Jan 19, 2025

azure-pipelines bot commented Jan 19, 2025

tianleiwu commented Jan 19, 2025

azure-pipelines bot commented Jan 19, 2025

Zyrin commented Dec 12, 2024 •

edited

Loading

tianleiwu commented Dec 17, 2024 •

edited

Loading

tianleiwu commented Jan 18, 2025 •

edited

Loading