Skip to content

Conversation

@raulcd
Copy link
Member

@raulcd raulcd commented Apr 16, 2025

Rationale for this change

When compiling without ARROW_COMPUTE we should not require the run end encode kernel to be available. The current test utility RunEndEncodeTableColumns should not be on gtest_util as this is going to be compiled whether ARROW_COMPUTE flag is enabled or not.

What changes are included in this PR?

Move the RunEndEncodeTableColumns utility that requires REE to be available to arrow/acero/test_util_internal as is only used there, move corresponding test to its own test module and remove the requirement for it to be exported with ARROW_TESTING_EXPORT

Are these changes tested?

Yes, on CI and tested locally that ARROW_COMPUTE=ON and ARROW_COMPUTE=OFF behave as expected.

Are there any user-facing changes?

If someone was using the RunEndEncodeTableColumns from libarrow_testing this is not being exported anymore.

… uses REE to test_util on compute instead of common gtest_util
@github-actions
Copy link

⚠️ GitHub issue #46157 has been automatically assigned in GitHub to PR creator.

@raulcd raulcd marked this pull request as ready for review April 16, 2025 11:00
@raulcd raulcd requested a review from westonpace as a code owner April 16, 2025 11:00
@raulcd raulcd requested review from WillAyd and kou April 16, 2025 11:01
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'm not so familiar with this ( though I merged the test_util test pr but I didn't know it will causing this, sorry for that)

@raulcd
Copy link
Member Author

raulcd commented Apr 16, 2025

@github-actions crossbow submit test-ubuntu-24.04-cpp-minimal-with-formats

@raulcd
Copy link
Member Author

raulcd commented Apr 16, 2025

I was also bitten by this on my PR to move the compute kernels to its own shared library in the past. I think the merged PR surfaced the problem but the problem was still there, we should not require REE if we are not using ARROW_COMPUTE.
In my PR I ended up temporarily always compiling REE (I left a TODO to investigate in the future)
btw the nightly test-ubuntu-24.04-cpp-minimal-with-formats has been failing reproducing the issue for the last days.

@github-actions
Copy link

Revision: e97c952

Submitted crossbow builds: ursacomputing/crossbow @ actions-f9ba3f450a

Task Status
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions

Copy link
Contributor

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - nice work!

@raulcd
Copy link
Member Author

raulcd commented Apr 22, 2025

@kou do you want to take a look at this one? I'll merge tomorrow otherwise

@kou
Copy link
Member

kou commented Apr 22, 2025

Ah, sorry. I'll take a look at this tomorrow.

@@ -0,0 +1,30 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a test_util_internal.h in this directory, why this new file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was on gtest_util.h and is exposed with ARROW_TESTING_EXPORT. We don't seem to export any of these utilities from our various test_util_internal.h. This is not only used on arrow compute but on an arrow acero test so I thought we might want this to be exported. I can move to test_util_internal.h if we think is better but that was the reasoning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If test_util_internal.h works then we shouldn't create something else. Let's try it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to export it for Acero.

Copy link
Member

@pitrou pitrou Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it for Acero tests but it can still be in a _internal header, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've initially moved it to arrow/compute/kernels/test_util_internal.h, in that case we require to also add to libarrow_testing.so -> arrow/compute/test_util_internal.cc otherwise we get some missing symbols (undefined reference to arrow::compute::ValidateOutput(arrow::Datum const&)):

  list(APPEND ARROW_TESTING_SRCS compute/kernels/test_util_internal.cc
       compute/test_util_internal.cc)

I don't think it's great that we are exporting symbols from test_util_internal.cc and that we are adding some of the internals to libarrow_testing.so. See the initial commit: 6ee47f2

The above fails on Windows with 'arrow::compute::RunEndEncodeTableColumns': inconsistent dll linkage. I could continue with that route but I would probably have to remove our current arrow_compute_kernels_testing target to just put the internal utilities on libarrow_testing.so.

As I think this goes against a better hygiene of libarrow_testing.so, at this point I have decided to move the utility function to arrow/acero/test_util_internal.h and not require ARROW_TESTING_EXPORT at all. I think the utility function is quite niche and can live there and acero will always require compute.

Copy link
Member

@pitrou pitrou Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's great that we are exporting symbols from test_util_internal.cc and that we are adding some of the internals to libarrow_testing.so

It's a bit confusing but these all have slightly different purposes:

  1. the internal namespace tells that an API is not for public use
  2. the _internal.h suffix is for private headers that are not installed, so that third-party code can not include them; the APIs in those headers are implicitly internal, even if not in an internal namespace
  3. we can still need to export internal APIs in our DLLs, because the unit tests may need them and are usually dynamically linked
  4. we can still need to put internal APIs in public headers, for example because they are used in public inline functions
  5. we unfortunately have APIs that are morally internal, but have not been put in a internal namespace or a _internal.h header; I've just opened [C++] Review headers for internal APIs #46211 for this

Here we have an internal API that is used by unit tests, so needs to be exported in a DLL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation!
The only thing I am not sure I understand is:

Here we have an internal API that is used by unit tests, so needs to be exported in a DLL.

What is the difference with for example:

Result<std::shared_ptr<Table>> MakeRandomTimeSeriesTable(
    const TableGenerationProperties& properties) 

We are not using any kind of export there?
And the RunEndEncodeTableColumns is used in a single test on hash_aggregate_test.cc

Copy link
Member

@pitrou pitrou Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it seems acero/test_util_internal.cc is compiled directly into every Acero unit test (using an object library), it does not go into a DLL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is the case for all the different test_util_internal.cc. I think we can keep RunEndEncodeTableColumns at acero/test_util_internal.cc as is on the last commit. The function is used on a single test on acero (nowhere else). Doesn't require to be exported as it does not go into a DLL.
We can discuss if we want to refactor how we export the different internal APIs for test_util_internal and whether we should have a public utilities testing DLL on a separate issue.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Apr 22, 2025
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting merge Awaiting merge labels Apr 23, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 23, 2025
@raulcd
Copy link
Member Author

raulcd commented Apr 23, 2025

@github-actions crossbow submit test-ubuntu-24.04-cpp-minimal-with-formats

@raulcd raulcd changed the title GH-46157: [C++] Move test utility RunEndEncodeTableColumns that uses REE to test_util on compute instead of common gtest_util GH-46157: [C++] Move test utility RunEndEncodeTableColumns that uses REE to test_util_internal on acero instead of common gtest_util Apr 23, 2025
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 23, 2025
@github-actions
Copy link

Revision: 72fc9cf

Submitted crossbow builds: ursacomputing/crossbow @ actions-ebe5b9b780

Task Status
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 minor two small wishes

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 29, 2025
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
@raulcd
Copy link
Member Author

raulcd commented Apr 29, 2025

@github-actions crossbow submit test-ubuntu-24.04-cpp-minimal-with-formats

@github-actions
Copy link

Revision: e055048

Submitted crossbow builds: ursacomputing/crossbow @ actions-fc1e7eaafc

Task Status
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions

@raulcd
Copy link
Member Author

raulcd commented Apr 29, 2025

@raulcd
Copy link
Member Author

raulcd commented Apr 30, 2025

Merging this as the two small comments have been applied.

@raulcd raulcd merged commit baf97fd into apache:main Apr 30, 2025
35 of 36 checks passed
@raulcd raulcd removed the awaiting change review Awaiting change review label Apr 30, 2025
@raulcd raulcd deleted the GH-46157 branch April 30, 2025 07:39
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit baf97fd.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 65 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants