GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x #48195

Vishwanatha-HD · 2025-11-20T19:43:07Z

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes "arrow-bit-utility-test" testcase failure.

The "arrow-bit-utility-test" testcase was failing with 1 Test case Failure on Big-endian platforms.

[ FAILED ] 1 test, listed below:
[ FAILED ] BitStreamUtil.ZigZag

What changes are included in this PR?

The fix includes changes to "bit_stream_utils_internal.h" file to address the Abort/Core dump issues.

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: [C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151
GitHub Issue: [C++] arrow-bit-utility-test failed on Big-Endian (s390x) systems #48194

github-actions · 2025-11-21T12:18:58Z

⚠️ GitHub issue #48194 has been automatically assigned in GitHub to PR creator.

kou · 2025-11-21T22:35:51Z

cpp/src/arrow/util/bit_stream_utils_internal.h

  }
+#else
+  // For VLQ reading, always read directly from buffer to avoid endianness issues
+  // with buffered_values_ on big-endian systems like s390x


Suggested change

// with buffered_values_ on big-endian systems like s390x

// with buffered_values_ on big-endian systems like s390x.

kou · 2025-11-21T22:36:11Z

cpp/src/arrow/util/bit_stream_utils_internal.h

+#else
+  // For VLQ reading, always read directly from buffer to avoid endianness issues
+  // with buffered_values_ on big-endian systems like s390x
+  // Calculate current position in buffer accounting for bit offset


Suggested change

// Calculate current position in buffer accounting for bit offset

// Calculate current position in buffer accounting for bit offset.

kou · 2025-11-21T22:36:25Z

cpp/src/arrow/util/bit_stream_utils_internal.h

  const uint8_t* data = NULLPTR;
  int max_size = 0;
+#if ARROW_LITTLE_ENDIAN
+  // The data that we will pass to the LEB128 parser


Suggested change

// The data that we will pass to the LEB128 parser

// The data that we will pass to the LEB128 parser.

kou · 2025-11-21T22:36:34Z

cpp/src/arrow/util/bit_stream_utils_internal.h

  int max_size = 0;
+#if ARROW_LITTLE_ENDIAN
+  // The data that we will pass to the LEB128 parser
+  // In all case, we read a byte-aligned value, skipping remaining bits


Suggested change

// In all case, we read a byte-aligned value, skipping remaining bits

// In all case, we read a byte-aligned value, skipping remaining bits.

kou · 2025-11-21T22:37:22Z

testing

Could you revert this change?
(This is needless, right?)

Hi @kou.. Yes certainly, this is needless.. I had earlier tried "git restore" and "git reset --hard" to remove "testing" changes.. But I didnt realize that this is a sub-module and I need to do "git submodule update --init --recursive"...
Now, I have dont that and removed the changes.. thanks..

kou · 2025-11-21T22:44:39Z

cpp/src/arrow/util/bit_stream_utils_internal.h

+  const int current_byte_offset = byte_offset_ + bit_util::BytesForBits(bit_offset_);
+  const int bytes_left_in_buffer = max_bytes_ - current_byte_offset;
+
+  // Always read from buffer directly to avoid endianness issues
+  data = buffer_ + current_byte_offset;
+  max_size = bytes_left_in_buffer;


Does this the same logic as

arrow/cpp/src/arrow/util/bit_stream_utils_internal.h

Lines 380 to 383 in 2fb2f79

} else {

max_size = bytes_left();

data = buffer_ + (max_bytes_ - max_size);

}

?

If so, should we reuse it something like the following?

diff --git a/cpp/src/arrow/util/bit_stream_utils_internal.h b/cpp/src/arrow/util/bit_stream_utils_internal.h index d8c7317fe8..7352312782 100644 --- a/cpp/src/arrow/util/bit_stream_utils_internal.h +++ b/cpp/src/arrow/util/bit_stream_utils_internal.h @@ -366,6 +366,9 @@ inline bool BitReader::GetVlqInt(Int* v) { const uint8_t* data = NULLPTR; int max_size = 0; +#if ARROW_LITTLE_ENDIAN + // TODO: Describe why we need this only for little-endian. + // Number of bytes left in the buffered values, not including the current // byte (i.e., there may be an additional fraction of a byte). const int bytes_left_in_cache = @@ -377,7 +380,9 @@ inline bool BitReader::GetVlqInt(Int* v) { data = reinterpret_cast<const uint8_t*>(&buffered_values_) + bit_util::BytesForBits(bit_offset_); // Otherwise, we try straight from buffer (ignoring few bytes that may be cached) - } else { + } else +#endif + { max_size = bytes_left(); data = buffer_ + (max_bytes_ - max_size); }

…s390x

github-actions bot added Component: C++ awaiting review Awaiting review labels Nov 20, 2025

Vishwanatha-HD mentioned this pull request Nov 20, 2025

[C++] arrow-bit-utility-test failed on Big-Endian (s390x) systems #48194

Open

Vishwanatha-HD changed the title ~~GH-48177: [C++][Parquet] Fix arrow-bit-utility-test failure on s390x~~ GH-48194: [C++][Parquet] Fix arrow-bit-utility-test failure on s390x Nov 21, 2025

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

Open

kou reviewed Nov 21, 2025

View reviewed changes

apacheGH-48177: [C++][Parquet] Fix arrow-bit-utility-test failure on …

c0a142f

…s390x

Vishwanatha-HD force-pushed the fixParqIssues3 branch from ce32ab1 to c0a142f Compare November 22, 2025 04:30

kou changed the title ~~GH-48194: [C++][Parquet] Fix arrow-bit-utility-test failure on s390x~~ GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x Nov 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x #48195

GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x #48195

Vishwanatha-HD commented Nov 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

kou Nov 21, 2025

Uh oh!

kou Nov 21, 2025

Uh oh!

kou Nov 21, 2025

Uh oh!

kou Nov 21, 2025

Uh oh!

kou Nov 21, 2025

Uh oh!

Vishwanatha-HD Nov 22, 2025 •

edited

Loading

Uh oh!

kou Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// with buffered_values_ on big-endian systems like s390x
	// with buffered_values_ on big-endian systems like s390x.

	// Calculate current position in buffer accounting for bit offset
	// Calculate current position in buffer accounting for bit offset.

	// The data that we will pass to the LEB128 parser
	// The data that we will pass to the LEB128 parser.

	// In all case, we read a byte-aligned value, skipping remaining bits
	// In all case, we read a byte-aligned value, skipping remaining bits.

	} else {
	max_size = bytes_left();
	data = buffer_ + (max_bytes_ - max_size);
	}

GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x #48195

Are you sure you want to change the base?

GH-48194: [C++] Fix arrow-bit-utility-test failure on s390x #48195

Conversation

Vishwanatha-HD commented Nov 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

kou Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kou Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kou Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kou Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kou Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Vishwanatha-HD Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kou Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Vishwanatha-HD commented Nov 20, 2025 •

edited by github-actions bot

Loading

Vishwanatha-HD Nov 22, 2025 •

edited

Loading