Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Use SIMD to retrofit and optimize furycpp #2013

Open
pandalee99 opened this issue Jan 19, 2025 · 6 comments
Open

[C++] Use SIMD to retrofit and optimize furycpp #2013

pandalee99 opened this issue Jan 19, 2025 · 6 comments
Assignees
Labels
c++ enhancement New feature or request

Comments

@pandalee99
Copy link
Contributor

Feature Request

Maybe we can try some portable SIMD libraries, like
https://github.com/xtensor-stack/xsimd
https://github.com/google/highway
.. instead of handwritten intrinsic calls.

because xsimd is also often used on apache arrow to improve data processing, and it works very well.

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@pandalee99 pandalee99 added c++ enhancement New feature or request labels Jan 19, 2025
@pandalee99 pandalee99 self-assigned this Jan 19, 2025
@pandalee99
Copy link
Contributor Author

Then, we can continue to use Project simdutf to improve the original logic.
relate #2002 #1732

@pandalee99
Copy link
Contributor Author

Image

About simdutf then, I used Single-header version and did a simple test

std::string utf16ToUtf8WithSIMDUTF(const std::u16string &utf16) {
  // Get the length of the input UTF-16LE string
  size_t utf16_length = utf16.length();
  // Calculate the number of bytes required to convert UTF-16LE to UTF-8
  size_t utf8_length = simdutf::utf8_length_from_utf16le(reinterpret_cast<const char16_t *>(utf16.data()), utf16_length);
  // Create a string to store the UTF-8 result, initialized to the specified length
  std::string utf8_result(utf8_length, '\0');
  // Call convert_utf16le_to_utf8 to perform the conversion
  size_t written_bytes = simdutf::convert_utf16le_to_utf8(reinterpret_cast<const char16_t *>(utf16.data()), utf16_length, utf8_result.data());
  // Resize the string to match the actual number of written bytes
  utf8_result.resize(written_bytes);
  return utf8_result;
}

The operation efficiency is not as efficient

@pandalee99
Copy link
Contributor Author

cc @chaokunyang

@PragmaTwice
Copy link
Member

Could you attach a benchmark? e.g. in https://quick-bench.com/.

@pandalee99
Copy link
Contributor Author

Could you attach a benchmark? e.g. in https://quick-bench.com/.

sure, i will implement it later.

@pandalee99
Copy link
Contributor Author

I tried to carry out a series of rigorous tests, and finally came to this result.

Image

'BM_SIMD_UTF', also known as simdutf, does seem to perform better. I also feel a little sorry for the lack of rigor in the previous test.

Thank you very much for yours guidance. @PragmaTwice @chaokunyang
I will implement the benchmark module in furycpp to facilitate the later functional testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants