Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve symlink contents in sdists #713

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

QuLogic
Copy link
Member

@QuLogic QuLogic commented Dec 13, 2024

Wheels do not support symlinks (as they are ZIP files), but tarballs do. For consistent results however, expand those symlinks to the files they reference.

This is consistent with 0.16.0, and fixes this regression in 0.17.0.

This broke part of our latest release because we make it from an sdist: matplotlib/matplotlib#29229

In 0.16.0, the test was whether the source directory path was a file:
https://github.com/mesonbuild/meson-python/blob/0.16.0/mesonpy/__init__.py#L893
Then this file would be opened and inserted into the tarball:
https://github.com/mesonbuild/meson-python/blob/0.16.0/mesonpy/__init__.py#L909-L910
This path check returns True for symlinks, and the file opening would insert the target contents into the tarball.

As of 0.17.0 (specifically #587), the check is whether the TarInfo is a file:
https://github.com/mesonbuild/meson-python/blob/0.17.0/mesonpy/__init__.py#L882
This does not return True for a symlink, and then they get skipped.

If we just check member.issym(), then what ends up in the tarball is a symlink, so we need to copy information from the original in order to produce the contents as used to occur in 0.16.0.

Wheels do not support symlinks (as they are ZIP files), but tarballs do.
For consistent results however, expand those symlinks to the files they
reference.

This is consistent with 0.16.0, and fixes this regression in 0.17.0.
Copy link
Member

@dnicolodi dnicolodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that having the result of meson dist contain symbolic links and expect meson-python to turn them into regular files is a legitimate expectation. Why do you have symbolic links in your repository? I also have a few comments regarding the implementation.

if member.islnk() or member.issym():
# Symlinks are relative to member directory, but hard links are relative to tarball root.
path = member.name.rsplit('/', 1)[0] + '/' if member.issym() else ''
orig = meson_dist.getmember(path + member.linkname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are meson or git enforcing that symlinks do not point to location outside the tarball or outside the repository?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git does not appear to enforce any rules about committed symlinks.

# Symlinks are relative to member directory, but hard links are relative to tarball root.
path = member.name.rsplit('/', 1)[0] + '/' if member.issym() else ''
orig = meson_dist.getmember(path + member.linkname)
member = copy.copy(member)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the copy here necessary?

member.mode = orig.mode
member.mtime = orig.mtime
member.size = orig.size
member.type = tarfile.REGTYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't the link point to a directory, or to a special file?

# Meson 1.1.0, see https://github.com/mesonbuild/meson/pull/11432.
# Run the test anyway to ensure that meson-python can produce a
# wheel also for older versions of Meson.
if MESON_VERSION >= (1, 1, 99):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't use exlude_files or exclude_directories in your test, why this version check?

@dnicolodi
Copy link
Member

This is consistent with 0.16.0, and fixes this regression in 0.17.0.

This is not a regression. We decided to do not support symbolic links and other special files. Supporting them is not exactly trivial and I didn't see an use case for them. Why does matplotlib need symbolic links in its source repository?

@tacaswell
Copy link
Contributor

This is not a regression. We decided to do not support symbolic links and other special files.

Is this documented someplace? We looked at the release notes for 0.17 and did not see any mention of this change. As this appeared to be an undocumented behavior change that broke us, from our point of view it looked like a regression.

Why does matplotlib need symbolic links in its source repository?

We have files that we want to keep in-sync that we want to have different names (the second name is imposed on us via GTK) and while there are a bunch of ways we could solve this (just have two copies and be careful to keep them the same or add a build step that makes the second copy) using a symlink is the simplest way (trust the file system to do its job and git sorts it out on windows). I suspect reasonable people will disagree about if this is a "good enough" reason or not.

However, I am not sure that it is worth debating the merits of our use of symlinks. It is a fact that we do and I can not believe mpl is the only project that will trip on this so what can we do to save the next project? My preference would be to restore the behavior from 0.16, but if that is not on the table I think the second best option is to hard-fail the build when symlinks are encountered. The behavior in 0.17 of just dropping the files is the worst of all options and I would still argue is a bug even if not supporting symlinks is the intended behavior.

@rgommers
Copy link
Contributor

I was just typing a reply that kinda overlaps with @tacaswell's. Whether this is a regression or not is imho a semantic difference that isn't super important. This looks like a case of Hyrum's Law: we didn't explicitly promise or test that symlinks work for an sdist, but Matplotlib did rely on it and now it broke. Since matplotlib/matplotlib#29229 looks like it has a pretty high impact, let's see what we can do here to mitigate it - either temporarily or permanently.

Re symlinks in general: I still have hope for support for symlinks in wheels materializing (it's been giving a thumbs up in principle, however wheel 2.0 is a heavy lift). So if the change in this PR isn't too onerous, supporting symlinks in sdist creation doesn't seem unreasonable to me. If that sdist creation turns symlinks into copies, that means wheel builds work too, so there is a use case (and IIRC we've had questions about this before, like with nanoarrow). In most cases symlinks aren't the most healthy of the available options, but in some corner cases they may be.

@dnicolodi
Copy link
Member

I was not trying to argue that we should not support symlinks. I was just curious to see what is the motivation to have them in the repository of a cross-platform project, given that AFAIK git turns them into regular files on Windows.

The main problem with supporting symlinks is with sanitizing their destination. Including a file in the sdist outside the repository is obviously bad for reproducibility, but it is also a data exfiltration vector that I don't think we should support. Other than that, supporting symlinks to regular files within the repository by turning them into regular files is most likely a desirable feature.

I'm much less convinced that hard links should be supported. I don't think git has support for hard links anyway. Does it? Someone could be creating hard links in a dist-script, but I cannot think about any reason why someone would do that.

IIRC we've had questions about this before, like with nanoarrow

IIRC nanoarrow wanted to symlink directories. I don't think we should support that.

I think the second best option is to hard-fail the build when symlinks are encountered

Unfortunately that would be a compatibility break as bad as silently dropping support for symlinks: someone may be relying on the fact that special files are silently dropped and their build would be broken by this change. However, it is a good idea to add a warning for special files that do not make into the sdist.

@eli-schwartz
Copy link
Member

Anecdotally, meson itself has a bunch of symlinks in the testsuite, some of which are because we are in fact testing the functionality of meson functions that are documented to interact with symlinks in specific ways.

We had to delete them from git and start creating them in run_project_tests.py, because setuptools silently drops them from sdists. It is both awkwardly annoying to manage this as part of the test runner, and a deeply frustrating debug experience because they were silently dropped -- everything worked in CI, but tests failed for people building from sdist. On systems without symlink support, we skip those tests anyway so we don't care whether git/tar unpack them by extracting full file copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants