Skip to content

Conversation

@Vishwanatha-HD
Copy link

@Vishwanatha-HD Vishwanatha-HD commented Nov 18, 2025

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes two test failures
i.e. "arrow-ipc-message-internal-test" & "arrow-acero-hash-join-node-test" testcase failures.

As part of "arrow-acero-hash-join-node-test", the failing testcase was "HashJoin.FineGrainedResidualFilter"..
As part of "arrow-ipc-message-internal-test", the failing testcase was "TestMessageInternal.TestByteIdentical"..

What changes are included in this PR?

The fix includes changes to "hash_join.cc" and "message_internal_test.cc" files to address the above testcase failures.

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

@kou kou changed the title GH-48151: [C++][Parquet] Fix arrow-ipc-message-internal-test & arrow-… GH-48166: [C++][Parquet] Fix arrow-ipc-message-internal-test & arrow-acero-hash-join-node-test failures Nov 19, 2025
@github-actions
Copy link

⚠️ GitHub issue #48166 has been automatically assigned in GitHub to PR creator.

@kou kou changed the title GH-48166: [C++][Parquet] Fix arrow-ipc-message-internal-test & arrow-acero-hash-join-node-test failures GH-48176: [C++][Parquet] Fix arrow-ipc-message-internal-test & arrow-acero-hash-join-node-test failures Nov 19, 2025
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix lint failure?

Comment on lines +316 to +318
// Check if the scalar is a BooleanScalar before casting
if (mask.scalar()->type->id() == Type::BOOL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this check? This check didn't exist in the existing code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou,
Please find the root cause of this issue on s390x platform below:

The test was failing with a std::bad_cast exception in the ProbeBatch_ResidualFilter function in hash_join.cc at line 309. The issue occurred when the code tried to cast a Datum's scalar to a BooleanScalar using:

const auto& mask_scalar = mask.scalar_as();

The problem was that the test uses literal(NullScalar()) as one of the filter expressions, which creates a NullScalar (of type NullType), not a BooleanScalar (of type BoolType). When the code tried to cast this to a BooleanScalar, it failed with std::bad_cast on s390x architecture

The Fix
I modified the ProbeBatch_ResidualFilter function to check the scalar's type before attempting the cast

Comment on lines 80 to 81
// On Big-endian systems, 4 bytes are appended to indicate it as a BE system. Hence the
// total size is 4 bytes more than the LE systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that https://arrow.apache.org/docs/format/Columnar.html doesn't mention the 4 bytes for endianness. Could you share a document URL that mentions the 4 bytes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou..
I guess, I need to modify the comment here.. The root cause for this testcase failure is as below:

The test was failing because it contained a hardcoded expected byte array that was generated on macOS+ARM+LLVM, but in our case, the test was running on a Linux+s390x+GCC platform. FlatBuffer serialization can produce slightly different output across different platforms and toolchains, which caused the buffer size mismatch (expected 232 bytes vs actual 228 bytes).

@Vishwanatha-HD
Copy link
Author

Could you fix lint failure?

I have fixed the lint failures. Thanks..

Copy link
Author

@Vishwanatha-HD Vishwanatha-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken care of the review comments and made necessary changes to the code. Pls review my changes. Thanks..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants