-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-46157: [C++] Move test utility RunEndEncodeTableColumns that uses REE to test_util_internal on acero instead of common gtest_util #46161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… uses REE to test_util on compute instead of common gtest_util
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'm not so familiar with this ( though I merged the test_util test pr but I didn't know it will causing this, sorry for that)
|
@github-actions crossbow submit test-ubuntu-24.04-cpp-minimal-with-formats |
|
I was also bitten by this on my PR to move the compute kernels to its own shared library in the past. I think the merged PR surfaced the problem but the problem was still there, we should not require REE if we are not using |
|
Revision: e97c952 Submitted crossbow builds: ursacomputing/crossbow @ actions-f9ba3f450a
|
WillAyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - nice work!
|
@kou do you want to take a look at this one? I'll merge tomorrow otherwise |
|
Ah, sorry. I'll take a look at this tomorrow. |
| @@ -0,0 +1,30 @@ | |||
| // Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a test_util_internal.h in this directory, why this new file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was on gtest_util.h and is exposed with ARROW_TESTING_EXPORT. We don't seem to export any of these utilities from our various test_util_internal.h. This is not only used on arrow compute but on an arrow acero test so I thought we might want this to be exported. I can move to test_util_internal.h if we think is better but that was the reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If test_util_internal.h works then we shouldn't create something else. Let's try it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need to export it for Acero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it for Acero tests but it can still be in a _internal header, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've initially moved it to arrow/compute/kernels/test_util_internal.h, in that case we require to also add to libarrow_testing.so -> arrow/compute/test_util_internal.cc otherwise we get some missing symbols (undefined reference to arrow::compute::ValidateOutput(arrow::Datum const&)):
list(APPEND ARROW_TESTING_SRCS compute/kernels/test_util_internal.cc
compute/test_util_internal.cc)
I don't think it's great that we are exporting symbols from test_util_internal.cc and that we are adding some of the internals to libarrow_testing.so. See the initial commit: 6ee47f2
The above fails on Windows with 'arrow::compute::RunEndEncodeTableColumns': inconsistent dll linkage. I could continue with that route but I would probably have to remove our current arrow_compute_kernels_testing target to just put the internal utilities on libarrow_testing.so.
As I think this goes against a better hygiene of libarrow_testing.so, at this point I have decided to move the utility function to arrow/acero/test_util_internal.h and not require ARROW_TESTING_EXPORT at all. I think the utility function is quite niche and can live there and acero will always require compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's great that we are exporting symbols from
test_util_internal.ccand that we are adding some of the internals tolibarrow_testing.so
It's a bit confusing but these all have slightly different purposes:
- the
internalnamespace tells that an API is not for public use - the
_internal.hsuffix is for private headers that are not installed, so that third-party code can not include them; the APIs in those headers are implicitly internal, even if not in aninternalnamespace - we can still need to export internal APIs in our DLLs, because the unit tests may need them and are usually dynamically linked
- we can still need to put internal APIs in public headers, for example because they are used in public inline functions
- we unfortunately have APIs that are morally internal, but have not been put in a
internalnamespace or a_internal.hheader; I've just opened [C++] Review headers for internal APIs #46211 for this
Here we have an internal API that is used by unit tests, so needs to be exported in a DLL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation!
The only thing I am not sure I understand is:
Here we have an internal API that is used by unit tests, so needs to be exported in a DLL.
What is the difference with for example:
Result<std::shared_ptr<Table>> MakeRandomTimeSeriesTable(
const TableGenerationProperties& properties)
We are not using any kind of export there?
And the RunEndEncodeTableColumns is used in a single test on hash_aggregate_test.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it seems acero/test_util_internal.cc is compiled directly into every Acero unit test (using an object library), it does not go into a DLL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is the case for all the different test_util_internal.cc. I think we can keep RunEndEncodeTableColumns at acero/test_util_internal.cc as is on the last commit. The function is used on a single test on acero (nowhere else). Doesn't require to be exported as it does not go into a DLL.
We can discuss if we want to refactor how we export the different internal APIs for test_util_internal and whether we should have a public utilities testing DLL on a separate issue.
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
@github-actions crossbow submit test-ubuntu-24.04-cpp-minimal-with-formats |
|
Revision: 72fc9cf Submitted crossbow builds: ursacomputing/crossbow @ actions-ebe5b9b780
|
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 minor two small wishes
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
|
@github-actions crossbow submit test-ubuntu-24.04-cpp-minimal-with-formats |
|
Revision: e055048 Submitted crossbow builds: ursacomputing/crossbow @ actions-fc1e7eaafc
|
|
The CI failure is unrelated, I've opened: |
|
Merging this as the two small comments have been applied. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit baf97fd. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 65 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
When compiling without
ARROW_COMPUTEwe should not require the run end encode kernel to be available. The current test utilityRunEndEncodeTableColumnsshould not be on gtest_util as this is going to be compiled whetherARROW_COMPUTEflag is enabled or not.What changes are included in this PR?
Move the
RunEndEncodeTableColumnsutility that requires REE to be available toarrow/acero/test_util_internalas is only used there, move corresponding test to its own test module and remove the requirement for it to be exported withARROW_TESTING_EXPORTAre these changes tested?
Yes, on CI and tested locally that
ARROW_COMPUTE=ONandARROW_COMPUTE=OFFbehave as expected.Are there any user-facing changes?
If someone was using the
RunEndEncodeTableColumnsfromlibarrow_testingthis is not being exported anymore.