Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tar packaging #394

Merged
merged 3 commits into from
Feb 13, 2024
Merged

Conversation

nolange
Copy link
Contributor

@nolange nolange commented Jan 17, 2024

This introduces the normal practise of not storing personal uid/gid in the archives. tar does a different thing with them whether you unpack as root or another user.

Further the timestamp variations are useless noise, the second commit will normalize timestamps used by compiler/toolchain as well as the tar archive.

Copy link
Owner

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is much appreciated! A couple minor detail points and questions.

For the general tar cleanup, it'd be nice to apply that on the older release build packaging as well, see release.sh and release-macos.sh. There's probably no need to go all the way with fetching the source dates and all that, but it would at least be nice to apply at least the --numeric-owner --owner=0 --group=0 parts.

Secondly, as a feature request, is it possible to do something similar for the zip files?

steps:
- uses: actions/checkout@v3
with:
sparse-checkout: .
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sparse-checkout needed here? I guess it's nice for getting a smaller checkout if just wanting to inspect the git history, but the actual files within the llvm-mingw repo are quite minimal anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It explicitly states that the files arent needed

.github/workflows/build.yml Outdated Show resolved Hide resolved
.github/workflows/build.yml Show resolved Hide resolved
.github/workflows/build.yml Outdated Show resolved Hide resolved
The generated tar archives contain the uid/gid of the builder,
if unpacked as root on another system those will denote nothing
or an unexpected user/group.

The expectation would be uid/gid = 0.

Use a simple enough tar format as well.

The Mac version of tar (bsdtar?) does not support those flags,
for CI the separate gnu-tar is installed.
@nolange nolange force-pushed the improve_tar_packaging branch 2 times, most recently from 2047e8f to 3d6c817 Compare February 8, 2024 12:48
nolange and others added 2 commits February 13, 2024 00:16
This sets SOURCE_DATE_EPOCH to the commit timestamp, toolchains
should pick up timestamps from this variable.
The same sources ideally would produce the same binary, see
https://reproducible-builds.org.

Set the timestamps within the tar archives to the start of the
CI pipeline.

Sort the archive contents.
@mstorsjo mstorsjo force-pushed the improve_tar_packaging branch from 3d6c817 to f4aadbf Compare February 13, 2024 08:01
@mstorsjo
Copy link
Owner

Thanks; I think this looks good now. I've amended your branch (changing an instance of checkout@v3 into checkout@v4, and added a commit to release-macos.sh to check whether gtar is available), and I'll merge this.

FWIW, when linking a DLL/EXE, those files get the current time stored in a timestamp field as well. I've implemented using SOURCE_DATE_EPOCH in lld now as well, in llvm/llvm-project@0df8aed, which also is backported to the next 18.x release branch, so it will be included in the next (pre)release.

For those purposes, it would kinda be nice to have a more stable source of data for SOURCE_DATE_EPOCH than the commit date - e.g. if just tweaking the CI pipeline, by making a new commit, we'd now be affecting the contents of the output files. But anything more specific also becomes quite brittle and fiddly - e.g. if we'd have it be the latest commit date for any file touching build-llvm.sh or the relevant build-libcxx.sh or similar, plus all of wrappers, etc. (IIRC it can be important that the field actually does change, when the file contents change.)

LLD does have another flag, -Brepro on the lld-link level, which replaces the timestamp with a hash (and sets a flag indicating that the timestamp isn't a real timestamp), but that flag isn't exposed on the mingw linker level, and there's no env variable one can set to enable it.

@mstorsjo mstorsjo merged commit 3ed3b66 into mstorsjo:master Feb 13, 2024
7 checks passed
@nolange
Copy link
Contributor Author

nolange commented Feb 13, 2024

For those purposes, it would kinda be nice to have a more stable source of data for SOURCE_DATE_EPOCH than the commit date

I try to clean up these sources of random noise, but fully reproducible builds are still really hard in general (if you use debinfo you need to build in the same directories, even then it sometimes fails).
It's a decent start to take out the time the build job started.

For no "productive" change your best case is ending up with identical build-ids.

@mstorsjo
Copy link
Owner

For those purposes, it would kinda be nice to have a more stable source of data for SOURCE_DATE_EPOCH than the commit date

I try to clean up these sources of random noise, but fully reproducible builds are still really hard in general (if you use debinfo you need to build in the same directories, even then it sometimes fails). It's a decent start to take out the time the build job started.

Yeah - in this case I haven't tried how it behaves if built separately elsewhere (there's a million other things that affects the output there), but having two consecutive runs of the github actions pipeline produce identical output would at least be nice. Before these patches, there were differences only in the gendef executable (one case of __DATE__, which does get handled by SOURCE_DATE_EPOCH here, but I also took it out altogether upstream, mingw-w64/mingw-w64@202e375), and the <root>/<arch>-w64-mingw32/bin/*.dll due to timestamps. With the support for SOURCE_DATE_EPOCH in upstream lld, and these changes, that shouldn't vary any longer as long as rerunning the same pipeline from the same commit. But any commit will change COMMIT_DATE_UNIX and affect that.

@@ -36,6 +42,9 @@ jobs:
TAG=$(TZ=UTC date +%Y%m%d)
fi
echo TAG=$TAG >> $GITHUB_OUTPUT
echo COMMIT_DATE_UNIX=$(git log -1 --pretty=%ct $GITHUB_SHA) >> $GITHUB_OUTPUT
echo BUILD_DATE=$(date -u '+%FT%TZ') >> $GITHUB_OUTPUT
echo BUILD_DATE_UNIX=$(date -d "${BUILD_DATE}" +%s) >> $GITHUB_OUTPUT
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I just realized, this doesn't work as intended. At this point, ${BUILD_DATE} is still not set - we've only written it into $GITHUB_OUTPUT, which is a plain file that gets added to the environment starting with the next step in the job - but does not take effect immediately within the same script. So effectively, this executes date -d "" +%s, which produces a timestamp corresponding to the start of the current day, it seems.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See fix-build-date for a fix for this - if this seems reasonable to you, I'll push that to master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fix seems sensible

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll push that then.

The failure mode here, in producing a timestamp which is quite near to the intended one, but still not quite the same, was quite surprising too :-)

@nolange nolange deleted the improve_tar_packaging branch March 12, 2024 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants