Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Oct 12, 2025

What changes were proposed in this pull request?

To make it easier for Apache Arrow to integrate with Apache ORC.

Why are the changes needed?

  • Detect existing targets of 3rd-party dependencies from Apache Arrow, including: Protobuf, ZLIB, Snappy, and zstd.
  • Note that lz4 does not need this change because Apache Arrow has already used FetchContent for it.

How was this patch tested?

Test it locally against Apache Arrow main branch.

-- Apache ORC: Using existing arrow::protobuf::libprotobuf
-- Apache ORC: Using existing Snappy::snappy-static
-- Apache ORC: Using existing ZLIB::ZLIB
-- Apache ORC: Using existing zstd::libzstd_static
-- Apache ORC: Using vendored lz4

Was this patch authored or co-authored using generative AI tooling?

No.

@wgtmac
Copy link
Member Author

wgtmac commented Oct 12, 2025

@kou What do you think of this change? I've tested Apache Arrow in apache/arrow#47792

@wgtmac wgtmac changed the title ORC-2013: [C++][FOLLOWUP] Better CMake integration with Apache Arrow ORC-2013: [C++] Better CMake integration with Apache Arrow Oct 12, 2025
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +175 to +180
elseif (TARGET ${ARROW_PROTOBUF_LIBPROTOBUF})
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Protobuf
add_library (orc_protobuf INTERFACE IMPORTED)
target_link_libraries(orc_protobuf INTERFACE ${ARROW_PROTOBUF_LIBPROTOBUF})
set (PROTOBUF_EXECUTABLE ${ARROW_PROTOBUF_PROTOC})
message(STATUS "Using existing ${ARROW_PROTOBUF_LIBPROTOBUF}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the standard target names instead of ARROW_PROTOBUF_* so that other downstream projects can use Apache ORC by FetchContent?

Suggested change
elseif (TARGET ${ARROW_PROTOBUF_LIBPROTOBUF})
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Protobuf
add_library (orc_protobuf INTERFACE IMPORTED)
target_link_libraries(orc_protobuf INTERFACE ${ARROW_PROTOBUF_LIBPROTOBUF})
set (PROTOBUF_EXECUTABLE ${ARROW_PROTOBUF_PROTOC})
message(STATUS "Using existing ${ARROW_PROTOBUF_LIBPROTOBUF}")
elseif (TARGET protobuf::libprotobuf AND TARGET protobuf::protoc)
# Used for FetchContent by downstream projects
add_library (orc_protobuf INTERFACE IMPORTED)
target_link_libraries(orc_protobuf INTERFACE protobuf::libprotobuf)
set (PROTOBUF_EXECUTABLE protobuf::protoc)
message(STATUS "Using existing protobuf::libprotobuf and protobuf::protoc")

Apache Arrow can provide protobuf::libprotobuf/protobuf::protoc targets by add_library(ALIAS).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was supposed to add this as a temporary solution for Apache Arrow. Apache Arrow has already migrated lz4 to use FetchContent so I don't need to do anything and FetchContent_Declare in Apache ORC can automatically find it. I assume Apache Arrow will eventually migrate all dependencies to use FetchContent, am I correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I haven't used FetchContent_Declare(FIND_PACKAGE_ARGS) yet. Sorry.

If Apache Arrow migrates Protobuf, Snappy, zlib and Zstandard to FetchContent, we don't need this in Apache ORC, right? If so, let's improve Apache Arrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have successfully migrated dependencies in the Apache ORC to use FetchContent with FIND_PACKAGE_ARGS, I think Snappy and Zstandard should be pretty easy to migrate. It is a little bit tricky for Protobuf to find protoc. ZLIB is more messy since it does not have a release with good CMake support so I am currently using an unreleased git tag. I can work on this in the Apache Arrow one by one but may need your help :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's work on this together. I'll also work on this one by one in a few weeks. (I'll fix 22.0.0 release blockers before this.)

list (APPEND ORC_INSTALL_INTERFACE_TARGETS "$<INSTALL_INTERFACE:Snappy::snappy>")
orc_provide_find_module (SnappyAlt)
elseif (TARGET Snappy::snappy)
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Snappy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snappy::snappy is the standard target name provided by Snappy. So all downstream projects can use this by FetchContent:

Suggested change
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Snappy
# Used for FetchContent by downstream projects

target_link_libraries(orc_snappy INTERFACE Snappy::snappy)
message(STATUS "Using existing Snappy::snappy")
elseif (TARGET Snappy::snappy-static)
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Snappy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Snappy
# Used for FetchContent by downstream projects

Comment on lines +281 to +286
elseif (TARGET Snappy::snappy)
# Used by Apache Arrow only, remove this once Arrow leverages FetchContent for Snappy
add_library (orc_snappy INTERFACE IMPORTED)
target_link_libraries(orc_snappy INTERFACE Snappy::snappy)
message(STATUS "Using existing Snappy::snappy")
elseif (TARGET Snappy::snappy-static)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_package(Snappy) may provide both of Snappy::snappy and Snappy::snappy-static. Apache Arrow wants to control which is used. (Apache Arrow uses ``ARROW_SNAPPY_USE_SHARED` for it.)

Can Apache ORC provide a CMake variable to control it? For example:

elseif (ORC_SNAPPY_USE_SHARED AND TARGET Snappy::snappy)
  ...
elseif (NOT ORC_SNAPPY_USE_SHARED AND TARGET Snappy::snappy-static)
  ...
else ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants