[C++] Use SIMD to retrofit and optimize furycpp #2013

pandalee99 · 2025-01-19T16:18:45Z

Feature Request

Maybe we can try some portable SIMD libraries, like
https://github.com/xtensor-stack/xsimd
https://github.com/google/highway
.. instead of handwritten intrinsic calls.

because xsimd is also often used on apache arrow to improve data processing, and it works very well.

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

pandalee99 · 2025-01-19T16:18:59Z

Then, we can continue to use Project simdutf to improve the original logic.
relate #2002 #1732

pandalee99 · 2025-01-20T09:10:45Z

About simdutf then, I used Single-header version and did a simple test

std::string utf16ToUtf8WithSIMDUTF(const std::u16string &utf16) {
  // Get the length of the input UTF-16LE string
  size_t utf16_length = utf16.length();
  // Calculate the number of bytes required to convert UTF-16LE to UTF-8
  size_t utf8_length = simdutf::utf8_length_from_utf16le(reinterpret_cast<const char16_t *>(utf16.data()), utf16_length);
  // Create a string to store the UTF-8 result, initialized to the specified length
  std::string utf8_result(utf8_length, '\0');
  // Call convert_utf16le_to_utf8 to perform the conversion
  size_t written_bytes = simdutf::convert_utf16le_to_utf8(reinterpret_cast<const char16_t *>(utf16.data()), utf16_length, utf8_result.data());
  // Resize the string to match the actual number of written bytes
  utf8_result.resize(written_bytes);
  return utf8_result;
}

The operation efficiency is not as efficient

pandalee99 · 2025-01-20T09:12:09Z

cc @chaokunyang

PragmaTwice · 2025-01-20T09:31:43Z

Could you attach a benchmark? e.g. in https://quick-bench.com/.

pandalee99 · 2025-01-20T10:01:24Z

Could you attach a benchmark? e.g. in https://quick-bench.com/.

sure, i will implement it later.

pandalee99 · 2025-01-23T16:19:18Z

I tried to carry out a series of rigorous tests, and finally came to this result.

'BM_SIMD_UTF', also known as simdutf, does seem to perform better. I also feel a little sorry for the lack of rigor in the previous test.

Thank you very much for yours guidance. @PragmaTwice @chaokunyang
I will implement the benchmark module in furycpp to facilitate the later functional testing.

pandalee99 added c++ enhancement New feature or request labels Jan 19, 2025

pandalee99 self-assigned this Jan 19, 2025

This was referenced Jan 24, 2025

[C++] FuryCpp needs to add the benchmark module #2022

Closed

perf(c++): Evaluate the implementation effect &&simdutf performs partial vectorization #2033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Use SIMD to retrofit and optimize furycpp #2013

[C++] Use SIMD to retrofit and optimize furycpp #2013

pandalee99 commented Jan 19, 2025

pandalee99 commented Jan 19, 2025

pandalee99 commented Jan 20, 2025

pandalee99 commented Jan 20, 2025

PragmaTwice commented Jan 20, 2025

pandalee99 commented Jan 20, 2025

pandalee99 commented Jan 23, 2025

[C++] Use SIMD to retrofit and optimize furycpp #2013

[C++] Use SIMD to retrofit and optimize furycpp #2013

Comments

pandalee99 commented Jan 19, 2025

Feature Request

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

pandalee99 commented Jan 19, 2025

pandalee99 commented Jan 20, 2025

pandalee99 commented Jan 20, 2025

PragmaTwice commented Jan 20, 2025

pandalee99 commented Jan 20, 2025

pandalee99 commented Jan 23, 2025