Add x86 SIMD optimizations to crypto datatypes #507

Lastique · 2020-09-21T12:25:55Z

The v128 operations are optimized for SSE2/SSSE3/SSE4.1.
srtp_octet_string_is_eq is optimized for SSE2. When SSE2 is not available, use a pair of 32-bit accumulators to speed up the bulk of the operation. We use two accumulators to leverage instruction-level parallelism supported by most modern CPUs.
In srtp_cleanse, use memset and ensure it is not optimized away with a dummy asm statement, which can potentially consume the contents of the memory.
Endian conversion functions use gcc-style intrinsics, when possible.
In base64_block_to_octet_triple, prefer memchr to strchr as it explicitly accepts the string length, which is known at compile time.

The SIMD code uses intrinsics, which are available on all modern compilers. For MSVC, config_in_cmake.h is modified to define gcc/clang-style SSE macros based on MSVC predefined macros. We enable all SSE versions when it indicates that AVX is enabled. SSE2 is always enabled for x86-64 or for x86 when SSE2 FP math is enabled.

pabuhler · 2020-10-08T12:32:46Z

Hi, @Lastique thanks for these.

I guess these changes are based on real world usage, do you have any benchmarks on the speed improvements for a given configuration & platform?

Lastique · 2020-10-08T13:11:34Z

I have attached a unit benchmark to #508, but I don't have one at hand for the changes in this PR. I did bench the change when I originally wrote this patch for our in-house use a few years back, but unfortunately I didn't save it. I suppose, I could write one anew, if required.

Basically, these two PRs came out of our attempts to reduce libsrtp entries in the profiling reports for our WebRTC SFU software. The software maintains lots of SRTP connections (hundreds), and these changes allowed to save a few percents of total CPU usage. The most CPU consuming part in this PR that came up in our reports is bitvector_left_shift, the other places are mostly where I saw an easy opportunity to optimize.

Lastique · 2021-11-17T13:17:47Z

Rebased on top of the current master.

pabuhler · 2023-03-01T08:39:11Z

@Lastique thanks for updating, will take a new look at this now

- The v128 operations are optimized for SSE2/SSSE3. - srtp_octet_string_is_eq is optimized for SSE2. When SSE2 is not available, use a pair of 32-bit accumulators to speed up the bulk of the operation. We use two accumulators to leverage instruction-level parallelism supported by most modern CPUs. - In srtp_cleanse, use memset and ensure it is not optimized away with a dummy asm statement, which can potentially consume the contents of the memory. - Endian conversion functions use gcc-style intrinsics, when possible. The SIMD code uses intrinsics, which are available on all modern compilers. For MSVC, config_in_cmake.h is modified to define gcc/clang-style SSE macros based on MSVC predefined macros. We enable all SSE versions when it indicates that AVX is enabled. SSE2 is always enabled for x86-64 or for x86 when SSE2 FP math is enabled.

pabuhler

Will wait a few more day before merging in case any one has an objection.

pabuhler · 2023-03-14T17:02:58Z

crypto/math/datatypes.c

@@ -272,9 +480,14 @@ int srtp_octet_string_is_eq(uint8_t *a, uint8_t *b, int len)

 void srtp_cleanse(void *s, size_t len)
 {
+#if defined(__GNUC__)
+    memset(s, 0, len);
+    __asm__ __volatile__("" : : "r"(s) : "memory");


This is probably a little unrelated to the rest ... unless it was performance problem.

pabuhler · 2023-03-14T17:04:08Z

thanks for the pr @Lastique , and sorry it so long to get in

Lastique · 2023-03-14T19:55:58Z

Thanks.

This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cisco/libsrtp](https://github.com/cisco/libsrtp) | minor | `v2.5.0` -> `v2.6.0` | --- ### Release Notes <details> <summary>cisco/libsrtp (cisco/libsrtp)</summary> ### [`v2.6.0`](https://github.com/cisco/libsrtp/releases/tag/v2.6.0): libSRTP 2.6.0 [Compare Source](cisco/libsrtp@v2.5.0...v2.6.0) #### What's Changed - cmake: Support configuring as subproject by [@yangskyboxlabs](https://github.com/yangskyboxlabs) in cisco/libsrtp#640 - cmake: Rename TEST_APPS as LIBSRTP_TEST_APPS option by [@yangskyboxlabs](https://github.com/yangskyboxlabs) in cisco/libsrtp#641 - Add a missing typedef for stream list ctx by [@Lastique](https://github.com/Lastique) in cisco/libsrtp#643 - Add x86 SIMD optimizations to crypto datatypes by [@Lastique](https://github.com/Lastique) in cisco/libsrtp#507 - fix typo by [@uniontech-lilinjie](https://github.com/uniontech-lilinjie) in cisco/libsrtp#648 - iv length is constant so set only once by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#649 - Fix library version in Doxygen docs by [@Lastique](https://github.com/Lastique) in cisco/libsrtp#654 - remove comma from srtp_profile_t enum value. by [@melchi45](https://github.com/melchi45) in cisco/libsrtp#658 - meson.build: implement mbedtls support by [@iameli](https://github.com/iameli) in cisco/libsrtp#660 - remove travis reference from README.md by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#661 - try to use ubuntu-latest by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#664 - Some srtp_driver fixes by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#662 - Some cipher_driver fixes by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#663 - start using const on internal arguments by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#665 - Cleaning up cmake and enabled more warnings. by [@palerikm](https://github.com/palerikm) in cisco/libsrtp#666 - fix line break in stream_list.yml by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#668 - remove use of pointers to 32bit values by [@pabuhler](https://github.com/pabuhler) in cisco/libsrtp#667 #### New Contributors - [@yangskyboxlabs](https://github.com/yangskyboxlabs) made their first contribution in cisco/libsrtp#640 - [@uniontech-lilinjie](https://github.com/uniontech-lilinjie) made their first contribution in cisco/libsrtp#648 - [@melchi45](https://github.com/melchi45) made their first contribution in cisco/libsrtp#658 - [@iameli](https://github.com/iameli) made their first contribution in cisco/libsrtp#660 - [@palerikm](https://github.com/palerikm) made their first contribution in cisco/libsrtp#666 **Full Changelog**: cisco/libsrtp@v2.5.0...v2.6.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.walbeck.it/walbeck-it/docker-janus-gateway/pulls/132 Co-authored-by: renovate-bot <[email protected]> Co-committed-by: renovate-bot <[email protected]>

Lastique force-pushed the optimize_datatypes branch 4 times, most recently from f7f31d5 to ae2a70c Compare September 22, 2020 15:05

Lastique mentioned this pull request Sep 25, 2020

Change stream list to arrays and optimize stream lookup with SSE2 #508

Open

Lastique force-pushed the optimize_datatypes branch from ae2a70c to ccd75ec Compare November 17, 2021 13:13

Lastique force-pushed the optimize_datatypes branch from ccd75ec to 52d61dd Compare February 28, 2023 18:00

Lastique force-pushed the optimize_datatypes branch from 52d61dd to faa7998 Compare March 5, 2023 14:51

pabuhler approved these changes Mar 13, 2023

View reviewed changes

pabuhler reviewed Mar 14, 2023

View reviewed changes

pabuhler merged commit 398f299 into cisco:main Mar 14, 2023

Lastique deleted the optimize_datatypes branch March 14, 2023 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add x86 SIMD optimizations to crypto datatypes #507

Add x86 SIMD optimizations to crypto datatypes #507

Uh oh!

Lastique commented Sep 21, 2020 •

edited

Loading

Uh oh!

pabuhler commented Oct 8, 2020

Uh oh!

Lastique commented Oct 8, 2020

Uh oh!

Lastique commented Nov 17, 2021

Uh oh!

pabuhler commented Mar 1, 2023

Uh oh!

pabuhler left a comment

Uh oh!

pabuhler Mar 14, 2023

Uh oh!

pabuhler commented Mar 14, 2023

Uh oh!

Lastique commented Mar 14, 2023

Uh oh!

Uh oh!

Add x86 SIMD optimizations to crypto datatypes #507

Add x86 SIMD optimizations to crypto datatypes #507

Uh oh!

Conversation

Lastique commented Sep 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pabuhler commented Oct 8, 2020

Uh oh!

Lastique commented Oct 8, 2020

Uh oh!

Lastique commented Nov 17, 2021

Uh oh!

pabuhler commented Mar 1, 2023

Uh oh!

pabuhler left a comment

Choose a reason for hiding this comment

Uh oh!

pabuhler Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

pabuhler commented Mar 14, 2023

Uh oh!

Lastique commented Mar 14, 2023

Uh oh!

Uh oh!

Lastique commented Sep 21, 2020 •

edited

Loading