-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48206 Fix Statistics logic to enable Parquet DB support on s390x #48207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
69807c0 to
fc2f62c
Compare
| // Fallback: use encoder for other types | ||
| auto encoder = MakeTypedEncoder<DType>(Encoding::PLAIN, false, descr_, pool_); | ||
| encoder->Put(&src, 1); | ||
| auto buffer = encoder->FlushValues(); | ||
| dst->assign(reinterpret_cast<const char*>(buffer->data()), | ||
| static_cast<size_t>(buffer->size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the implementation in ARROW_LITTLE_ENDIAN for this?
#if !ARROW_LITTLE_ENDIAN
if constexprt (...) {
...
} else if ... {
...
}
#endif
auto encoder = MakeTypedEncoder<DType>(Encoding::PLAIN, false, descr_, pool_);
encoder->Put(&src, 1);
auto buffer = encoder->FlushValues();
auto ptr = reinterpret_cast<const char*>(buffer->data());
dst->assign(ptr, static_cast<size_t>(buffer->size()));| if constexpr (std::is_same_v<DType, Int32Type>) { | ||
| uint32_t u; | ||
| std::memcpy(&u, &src, sizeof(u)); | ||
| u = ::arrow::bit_util::ToLittleEndian(u); | ||
| dst->assign(reinterpret_cast<const char*>(&u), sizeof(u)); | ||
| return; | ||
| } else if constexpr (std::is_same_v<DType, Int64Type>) { | ||
| uint64_t u; | ||
| std::memcpy(&u, &src, sizeof(u)); | ||
| u = ::arrow::bit_util::ToLittleEndian(u); | ||
| dst->assign(reinterpret_cast<const char*>(&u), sizeof(u)); | ||
| return; | ||
| } else if constexpr (std::is_same_v<DType, FloatType>) { | ||
| uint32_t u; | ||
| static_assert(sizeof(u) == sizeof(float), "size"); | ||
| std::memcpy(&u, &src, sizeof(u)); | ||
| u = ::arrow::bit_util::ToLittleEndian(u); | ||
| dst->assign(reinterpret_cast<const char*>(&u), sizeof(u)); | ||
| return; | ||
| } else if constexpr (std::is_same_v<DType, DoubleType>) { | ||
| uint64_t u; | ||
| static_assert(sizeof(u) == sizeof(double), "size"); | ||
| std::memcpy(&u, &src, sizeof(u)); | ||
| u = ::arrow::bit_util::ToLittleEndian(u); | ||
| dst->assign(reinterpret_cast<const char*>(&u), sizeof(u)); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in XXXEncoder::Put() instead of here?
Rationale for this change
This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the Statistics logic.
What changes are included in this PR?
The fix includes changes to following file:
cpp/src/parquet/statistics.cc
Are these changes tested?
Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.
Are there any user-facing changes?
No
GitHub main Issue link: #48151