Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect_include_paths may use a different clang #2682

Open
workingjubilee opened this issue Nov 11, 2023 · 17 comments
Open

detect_include_paths may use a different clang #2682

workingjubilee opened this issue Nov 11, 2023 · 17 comments

Comments

@workingjubilee
Copy link
Member

Bindgen often is deployed to build code using libclang's dynamic library, set by LIBCLANG_PATH. However, in bindgen, there is this code:

rust-bindgen/bindgen/lib.rs

Lines 824 to 830 in 4f9fa49

let clang = match clang_sys::support::Clang::find(
None,
&clang_args_for_clang_sys,
) {
None => return,
Some(clang) => clang,
};

This starts with the following check:

https://github.com/KyleMayes/clang-sys/blob/b7992e864eaa2c8cb5c014a52074b962f257e29b/src/support.rs#L59-L65

Unfortunately, this means rust-bindgen can easily wind up pulling a different clang than libclang, and it never consults LIBCLANG_PATH. If this is an intended hazard of the detect_include_paths function, then it should be documented appropriately. However, as it can easily cause subtly but hazardously incorrect bindings to be generated, it is likely not intended that this is the default.

@workingjubilee workingjubilee changed the title detect_include_paths introduces incorrect bindings detect_include_paths may use a different clang Nov 11, 2023
@workingjubilee
Copy link
Member Author

workingjubilee commented Nov 11, 2023

The big hazard is that this can override the search paths for builtins, which must be exactly matched between the clang compiler and library for very similar reasons as to why Rust stdlibs must be matched with the compiler. At minimum, they can cause spurious build failures that are hard to trace and understand.

@jschwe
Copy link
Contributor

jschwe commented Mar 29, 2024

I ran into this when building servo / mozangle on ubuntu with both clang (clang-14) and clang-15 installed.
libclang-sys will take the most recent libclang version, while bindgen will take the default clang, which causes compile errors on my system. Additionally, even some GCC include paths are added to the search paths, which makes a very wierd mix.

[2024-03-29T15:08:06Z DEBUG bindgen] Generating bindings, libclang at /usr/lib/x86_64-linux-gnu/libclang-15.so.15.0.7 
[2024-03-29T15:08:06Z DEBUG bindgen] Trying to find clang with flags: ["-DANGLE_DISABLE_POOL_ALLOC", "-DANGLE_ENABLE_APPLE_WORKAROUNDS", "-DANGLE_ENABLE_ESSL", "-DANGLE_ENABLE_GLSL", "-DANGLE_ENABLE_HLSL", "-DANGLE_ENABLE_KEYEDMUTEX", "-DANGLE_ENABLE_SHARE_CONTEXT_LOCK=1", "-DANGLE_PLATFORM_EXPORT=", "-DANGLE_SKIP_DXGI_1_2_CHECK", "-DANGLE_TRANSLATOR_ESSL_ONLY", "-DANGLE_VMA_VERSION=2003000", "-DCR_CLANG_REVISION=\"llvmorg-16-init-6578-g0d30e92f-2\"", "-DDYNAMIC_ANNOTATIONS_ENABLED=0", "-DNOMINMAX", "-DUNICODE", "-DWINVER=0x0A00", "-D_ATL_NO_OPENGL", "-D_CRT_NONSTDC_NO_WARNINGS", "-D_CRT_RAND_S", "-D_CRT_SECURE_NO_DEPRECATE", "-D_HAS_EXCEPTIONS=0", "-D_SCL_SECURE_NO_DEPRECATE", "-D_SECURE_ATL", "-D_UNICODE", "-D_WINSOCK_DEPRECATED_NO_WARNINGS", "-D__NDK_FPABI__=", "-x", "c++", "-std=c++17"]
[2024-03-29T15:08:06Z DEBUG bindgen] Found clang: Clang { path: "/usr/lib/llvm-14/bin/clang", version: Some(CXVersion { Major: 14, Minor: 0, Subminor: 0 }), c_search_paths: None, cpp_search_paths: Some(["/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11", "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11", "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward", "/usr/lib/llvm-14/lib/clang/14.0.0/include", "/usr/local/include", "/usr/include/x86_64-linux-gnu", "/usr/include"]) }

@KyleMayes
Copy link
Contributor

I've just now released clang-sys v1.8.2 which should help with this issue.

Now, if runtime-loading of libclang is in use (which is the default for rust-bindgen) and a libclang shared library is currently loaded, Clang::find will check likely directories related to the loaded libclang shared library:

https://github.com/KyleMayes/clang-sys/blob/e4c9b7efb2a8a1a8af470e9b7953a76a809da6f3/src/support.rs#L88-L96

For example, if /home/kyle/llvm-project/build/lib/libclang.so is currently loaded when Clang::find is called, /home/kyle/llvm-project/build/lib and /home/kyle/llvm-project/build/bin will be checked for clang executables.

I'm hoping that in most cases this will result in the clang executable being selected that matches the currently loaded libclang shared library (it worked in my testing using a custom build of LLVM + Clang similar to my example above).

@KyleMayes
Copy link
Contributor

Nevermind, I've yanked clang-sys v1.8.2 because of KyleMayes/clang-sys#181.

I don't think there's any winning here, any change to the status quo w.r.t. Clang executable selection will probably just break someone else's builds. I think folks'll just need to use CLANG_PATH and/or LLVM_CONFIG_PATH in cases like my example above.

@jschwe
Copy link
Contributor

jschwe commented Jun 4, 2024

However, as it can easily cause subtly but hazardously incorrect bindings to be generated, it is likely not intended that this is the default.

@KyleMayes Are you confident that a user forgetting to set CLANG_PATH could never cause any incorrect bindings, and at most a build failure? I don't know enough about bindgen / clang internals, but I would vastly prefer a build failure, or a breaking change in bindgen / clang-sys, over incorrect bindings. A user installing a newer version of clang, then the default version of their system is probably not too uncommon of a case, and it would quite a pity if this were a hidden footgun that could cause incorrect bindings.

jschwe added a commit to jschwe/servo that referenced this issue Jun 17, 2024
* Don't prefix PATH with android toolchain:
  This is problematic, since it can cause Rust to select
  the NDK toolchain as a linker when compiling for the
  HOST target (e.g. compiling build-scripts).
* Prefix target compiler and compiler flags variables
  with `TARGET_` so as not to influence compilation
  for HOST targets.
 * SET `CLANG_PATH` to avoid [bindgen servo#2682]

 [bindgen servo#2682]: rust-lang/rust-bindgen#2682

Signed-off-by: Jonathan Schwender <[email protected]>
jschwe added a commit to jschwe/servo that referenced this issue Jun 17, 2024
* Postfix PATH with android toolchain:
  We only need to edit path for the linker specified
  in the `.cargo/config.toml` to be found. Adding the
  NDK clang to the end of PATH is sufficient for that.
  Adding the NDK clang to the front can cause problems
  however, since it causes the NDK `clang` to be
  preferred over the system clang. This can cause
  problems on some systems, where compiling
  e.g. buildscripts for HOST subsequently fails.
* Prefix target compiler and compiler flags variables
  with `TARGET_` so as not to influence compilation
  for HOST targets.
 * SET `CLANG_PATH` to avoid [bindgen servo#2682]

 [bindgen servo#2682]: rust-lang/rust-bindgen#2682

Signed-off-by: Jonathan Schwender <[email protected]>
jschwe added a commit to jschwe/servo that referenced this issue Jun 17, 2024
* Postfix PATH with android toolchain:
  We only need to edit path for the linker specified
  in the `.cargo/config.toml` to be found. Adding the
  NDK clang to the end of PATH is sufficient for that.
  Adding the NDK clang to the front can cause problems
  however, since it causes the NDK `clang` to be
  preferred over the system clang. This can cause
  problems on some systems, where compiling
  e.g. buildscripts for HOST subsequently fails.
* Prefix target compiler and compiler flags variables
  with `TARGET_` so as not to influence compilation
  for HOST targets.
 * SET `CLANG_PATH` to avoid [bindgen servo#2682]

 [bindgen servo#2682]: rust-lang/rust-bindgen#2682

Signed-off-by: Jonathan Schwender <[email protected]>
github-merge-queue bot pushed a commit to servo/servo that referenced this issue Jun 18, 2024
* Postfix PATH with android toolchain:
  We only need to edit path for the linker specified
  in the `.cargo/config.toml` to be found. Adding the
  NDK clang to the end of PATH is sufficient for that.
  Adding the NDK clang to the front can cause problems
  however, since it causes the NDK `clang` to be
  preferred over the system clang. This can cause
  problems on some systems, where compiling
  e.g. buildscripts for HOST subsequently fails.
* Prefix target compiler and compiler flags variables
  with `TARGET_` so as not to influence compilation
  for HOST targets.
 * SET `CLANG_PATH` to avoid [bindgen #2682]

 [bindgen #2682]: rust-lang/rust-bindgen#2682

Signed-off-by: Jonathan Schwender <[email protected]>
@workingjubilee
Copy link
Member Author

@KyleMayes Hmm. I appreciate the patch but I had to ship a lot more code than that to fix this problem: https://github.com/pgcentralfoundation/pgrx/pull/1325/files

As clang-sys you may have an easier time making this correct than I did, or you may want to demand more stringent usage, either way.

@eternalNight
Copy link

I have met the same issue when processing a header that uses stdatomic.h. With the header search paths of multiple clang versions, stdatomic.h no longer provides any declaration and that causes a build failure.

I think it is reasonable to make clang-sys finding clang and libclang of the same version. E.g., with dynamic linking:

  1. If neither CLANG_PATH nor LIBCLANG_PATH is defined, use the latest libclang.so (which is the current logic) and search for clang of the same version.
  2. If either env vars is defined, use that clang (or libclang.so) and search for libclang.so (or clang) of the same version.
  3. If both are defined, use them but print a warning if they have different versions (or shall we make it a fatal error?).

Version matching can be done by comparing the first line of clang --version and the returned string of clang_getClangVersion() API. Inside clang both uses the version string from getClangFullVersion() since 3.0.0 (see https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/Driver.cpp#L2010 and https://github.com/llvm/llvm-project/blob/main/clang/tools/libclang/CIndex.cpp#L9802).

This approach does not introduce additional search paths but only tries harder to use matched versions of clang and libclang.so. I hope that can minimize the backward compatibility issues that can arise.

One drawback of this approach is the performance overhead from multiple invocation to clang --version and load()/unload() of libclang.so. That may not be significant as this is only done once at initialization time for each bindgen invocation, but feel free to point out that I'm wrong.

For the static linking case, we can directly call clang_getClangVersion() and just search for a matching clang at runtime. I have not thought about that thoroughly, though.

@KyleMayes How do you think of this approach? If that looks promising, I can look into the details and submit a PR to clang-sys for your review. Thanks!

@KyleMayes
Copy link
Contributor

Any change to the existing logic for picking libclang / clang instances (that isn't just adding additional places to search after the current places to search) will break a bunch of people's builds.

I already published a version of clang-sys(v1.8.2) that did a good job of selecting the same version of libclang / clang instances when using runtime linking but I ended up yanking it the next day as I mentioned earlier in this thread: KyleMayes/clang-sys#181

So while I understand the problem and want to fix it, I'd be making some people's lives a bit easier at the cost of making other people's a little more difficult. Maybe it's worth it in this case, but I'm lazy and suffer from status quo bias.

@jschwe
Copy link
Contributor

jschwe commented Sep 6, 2024

Any change to the existing logic for picking libclang / clang instances [...] will break a bunch of people's builds.

But this only applies to the default logic. Adding a new API which e.g. is "give me a matching clang and libclang" or "give me the clang for this specific libclang" should not break any users, since it is a new opt-in API.
bindgen could then opt-in to use this new API, which would allow fixing the issue discussed here.

@KyleMayes
Copy link
Contributor

The Probably Unnecessary Prelude

My understanding is that the original concern raised in this issue was that the bindings produced by rust-bindgen could be subtly broken (or just broken in an unclear and difficult to fix way) to due a mismatch between the selected libclang and clang instances. Many users might not even be aware they are relying on rust-bindgen / clang-sys since they could be deeply buried in the transitive dependencies of their crate.

Fixing that problem would require introducing a "breaking" change to the logic for selecting libclang and clang instances in clang-sys (what I attempted to do in the now yanked v1.8.2) so that the default behavior would prevent the vast majority of people getting burned by libclang / clang mismatches.

A Possible Path Forward

I am open to the idea of an alternate behavior for selecting libclang / clang instances to maximize the chances of the selected instances matching (which might just take the form of my v1.8.2 changes but gated behind a Cargo feature and/or environment variable).

Then, if the maintainers of rust-bindgen are willing to introduce such a breaking change themselves (i.e., take responsibility for any broken builds due to the change in behavior) they could then enable such a clang-sys Cargo feature.

I'll try to look into this further this week, but no promises (I'm going on vacation starting Friday 🏝️).

@workingjubilee
Copy link
Member Author

@KyleMayes No, it actually would do more than that, it was just straight-up breaking builds.

@KyleMayes
Copy link
Contributor

KyleMayes commented Sep 11, 2024

I understand that builds will be broken either way, but KyleMayes/clang-sys#181 (someone complaining about v1.8.2 breaking their build) has 6 thumbs up (which it racked up very quickly) and this issue has zero.

Very small numbers and not very scientific, I know, but my point is that I don't want to introduce changes that negatively affect people without feeling very confident that it's worth it.

I'll try to think about this problem more, I'm coming around to the idea that libclang / clang synchronization might be an important enough feature to be worth annoying a couple dozen people.

@workingjubilee
Copy link
Member Author

workingjubilee commented Sep 11, 2024

@KyleMayes If you use incorrect headers and you don't break someone's build, the resulting compilation is incorrect, which means the bindings can be unsound. If you do break someone's build, then that's preferable to emitting unsound bindings.

I reported this issue, and added a horrible hack to work around clang-sys's problems on my own end, because I received numerous reports of this being a problem with a framework I supported the tooling for, when the C codebase we were binding against added a new release that used more of clang's headers. Probably more than just six issues, at this point. They aren't coming here to give a thumbs up because the issue was so convoluted and tangled that they don't even understand why it started breaking.

@KyleMayes
Copy link
Contributor

Sorry, I certainly didn't mean to dismiss your concerns!

I hope I haven't come off as flippant. I am interested in fixing the problem, I'm just very conflict averse...

This is very valuable feedback, and I think it provides a good motivation for fixing this issue even if it requires a bit of disruption to others.

@workingjubilee
Copy link
Member Author

@KyleMayes

And I got direct reports from multiple other people, too, over other venues that aren't public. There's other issues opened against the pgrx repo that probably are related but I have no good evidence to believe one way or another.

@workingjubilee
Copy link
Member Author

workingjubilee commented Sep 11, 2024

Now, consider this: Everyone that had an issue with the new version was on macOS.

But the majority of people running their own databases in production... relevant, here, because the pgrx framework has to do with database extensions... does it on Linux. And the previous version, the one you reverted to, was breaking Linux builds. But I assure you, it was also breaking macOS before!

So you seem to have implemented a fix that broke macOS but worked on Linux. If you actually in fact eliminated almost all problematic Linux builds, that was a strict improvement.

You probably just needed to make your fix OS-aware, not to make a wholesale revert.

@jschwe
Copy link
Contributor

jschwe commented Sep 11, 2024

I am open to the idea of an alternate behavior for selecting libclang / clang instances to maximize the chances of the selected instances matching (which might just take the form of my v1.8.2 changes but gated behind a Cargo feature and/or environment variable).

Thats great to hear!

Just to recap, bindgen uses clang_sys::support::Clang::find() only here to find the header include paths. Potential solutions to the original bindgen issue:

  • Could SharedLibrary have a function / members that return the same results as clang.cpp_search_paths and clang.c_search_paths? This would avoid needing to find a clang executable in the first place.
  • Could SharedLibrary return the location of the associated Clang executable?
  • If we call .version() on SharedLibrary, could we pass that to a potential clang_sys::support::Clang::find_specific_version(version)? That might not be sufficient though, since some places uses forks of LLVM, so e.g. clang 15.x.y installed on the system might not be equivalent in terms of headers files to a vendor specific clang 15.x.y, even though the version number matches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants