-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x #48195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
| } | ||
| #else | ||
| // For VLQ reading, always read directly from buffer to avoid endianness issues | ||
| // with buffered_values_ on big-endian systems like s390x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // with buffered_values_ on big-endian systems like s390x | |
| // with buffered_values_ on big-endian systems like s390x. |
| #else | ||
| // For VLQ reading, always read directly from buffer to avoid endianness issues | ||
| // with buffered_values_ on big-endian systems like s390x | ||
| // Calculate current position in buffer accounting for bit offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Calculate current position in buffer accounting for bit offset | |
| // Calculate current position in buffer accounting for bit offset. |
| const uint8_t* data = NULLPTR; | ||
| int max_size = 0; | ||
| #if ARROW_LITTLE_ENDIAN | ||
| // The data that we will pass to the LEB128 parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // The data that we will pass to the LEB128 parser | |
| // The data that we will pass to the LEB128 parser. |
| int max_size = 0; | ||
| #if ARROW_LITTLE_ENDIAN | ||
| // The data that we will pass to the LEB128 parser | ||
| // In all case, we read a byte-aligned value, skipping remaining bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // In all case, we read a byte-aligned value, skipping remaining bits | |
| // In all case, we read a byte-aligned value, skipping remaining bits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert this change?
(This is needless, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kou.. Yes certainly, this is needless.. I had earlier tried "git restore" and "git reset --hard" to remove "testing" changes.. But I didnt realize that this is a sub-module and I need to do "git submodule update --init --recursive"...
Now, I have dont that and removed the changes.. thanks..
| const int current_byte_offset = byte_offset_ + bit_util::BytesForBits(bit_offset_); | ||
| const int bytes_left_in_buffer = max_bytes_ - current_byte_offset; | ||
|
|
||
| // Always read from buffer directly to avoid endianness issues | ||
| data = buffer_ + current_byte_offset; | ||
| max_size = bytes_left_in_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this the same logic as
arrow/cpp/src/arrow/util/bit_stream_utils_internal.h
Lines 380 to 383 in 2fb2f79
| } else { | |
| max_size = bytes_left(); | |
| data = buffer_ + (max_bytes_ - max_size); | |
| } |
If so, should we reuse it something like the following?
diff --git a/cpp/src/arrow/util/bit_stream_utils_internal.h b/cpp/src/arrow/util/bit_stream_utils_internal.h
index d8c7317fe8..7352312782 100644
--- a/cpp/src/arrow/util/bit_stream_utils_internal.h
+++ b/cpp/src/arrow/util/bit_stream_utils_internal.h
@@ -366,6 +366,9 @@ inline bool BitReader::GetVlqInt(Int* v) {
const uint8_t* data = NULLPTR;
int max_size = 0;
+#if ARROW_LITTLE_ENDIAN
+ // TODO: Describe why we need this only for little-endian.
+
// Number of bytes left in the buffered values, not including the current
// byte (i.e., there may be an additional fraction of a byte).
const int bytes_left_in_cache =
@@ -377,7 +380,9 @@ inline bool BitReader::GetVlqInt(Int* v) {
data = reinterpret_cast<const uint8_t*>(&buffered_values_) +
bit_util::BytesForBits(bit_offset_);
// Otherwise, we try straight from buffer (ignoring few bytes that may be cached)
- } else {
+ } else
+#endif
+ {
max_size = bytes_left();
data = buffer_ + (max_bytes_ - max_size);
}ce32ab1 to
c0a142f
Compare
Rationale for this change
This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes "arrow-bit-utility-test" testcase failure.
The "arrow-bit-utility-test" testcase was failing with 1 Test case Failure on Big-endian platforms.
[ FAILED ] 1 test, listed below:
[ FAILED ] BitStreamUtil.ZigZag
What changes are included in this PR?
The fix includes changes to "bit_stream_utils_internal.h" file to address the Abort/Core dump issues.
Are these changes tested?
Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.
Are there any user-facing changes?
No
GitHub main Issue link: [C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151
GitHub Issue: [C++] arrow-bit-utility-test failed on Big-Endian (s390x) systems #48194