-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47798: [CI][Packaging] Enable reproducible builds for Linux packages #47864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…porarily and skip rat check on patch
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
…or reproducible builds
…ile even though it is wrong. Just testing purposes
…when HOME is modified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently building the artifacts 3 times, 2 via reprotest to validate reproducibility and 1 for the final artifact generated. Similar to when we run reprotest for our source code but on the Linux Packages case it obviously takes a really long time, 2h 30m on the debian-trixie-amd64 job that is running reprotest:
https://github.com/apache/arrow/actions/runs/18685243685/job/53276081878
Should the approach be to run reprotest always? It potentially could be interesting to disable REPROTEST on PR checks and enable it manually if necessary?
There is still a lot of work to be done here, apart from fixing reproducible builds when build_path variant is used and adding the requirements to other debian packages and rpm.
| export DEB_BUILD_OPTIONS | ||
| df -h | ||
| if [ "${REPROTEST:-no}" = "yes" ]; then | ||
| run reprotest --verbosity 2 --vary=-kernel,-fileordering,-domain_host,-build_path -s .. ./reprotest.sh **.deb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently investigating why if build_path is exercised, basically applying the following diff:
| run reprotest --verbosity 2 --vary=-kernel,-fileordering,-domain_host,-build_path -s .. ./reprotest.sh **.deb | |
| run reprotest --verbosity 2 --vary=-kernel,-fileordering,-domain_host -s .. ./reprotest.sh **.deb |
The binaries generated (**.deb) are not reproducible, I've tried different approaches like:
diff --git a/dev/tasks/linux-packages/apache-arrow/debian/rules b/dev/tasks/linux-packages/apache-arrow/debian/rules
index 19dba393b1..17ef34fc4b 100755
--- a/dev/tasks/linux-packages/apache-arrow/debian/rules
+++ b/dev/tasks/linux-packages/apache-arrow/debian/rules
@@ -6,7 +6,7 @@
# This has to be exported to make some magic below work.
export DH_OPTIONS
-export DEB_BUILD_MAINT_OPTIONS=reproducible=-timeless
+export DEB_BUILD_MAINT_OPTIONS= hardening=+all reproducible=-timeless,+fixfilepath
BUILD_TYPE=relwithdebinfo
@@ -31,6 +31,7 @@ override_dh_auto_configure:
--builddirectory=cpp_build \
--buildsystem=cmake+ninja \
-- \
+ $(shell dpkg-buildflags --export=configure) \
-DARROW_AZURE=$${ARROW_AZURE} \
-DARROW_BUILD_UTILITIES=ON \
-DARROW_COMPUTE=ON \but no luck so far. More info about build_path in reproducible builds:
https://reproducible-builds.org/docs/build-path/
|
How about running diff --git a/.github/workflows/package_linux.yml b/.github/workflows/package_linux.yml
index ba86389428..ead6faf525 100644
--- a/.github/workflows/package_linux.yml
+++ b/.github/workflows/package_linux.yml
@@ -229,10 +229,11 @@ jobs:
release_candidate.yml
- name: Build
run: |
- pushd dev/tasks/linux-packages
- rake docker:pull || :
- rake --trace ${TASK_NAMESPACE}:build BUILD_DIR=build
- popd
+ rake -C dev/tasks/linux-packages docker:pull || :
+ BUILD_DIR=dev/tasks/linux-packages/build \
+ reprotest \
+ -c "rake -C dev/tasks/linux-packages --trace ${TASK_NAMESPACE}:build " \
+ . \ "dev/tasks/linux-packages/*/${TASK_NAMESPACE}/repositories/**/*.*"
- name: Docker Push
continue-on-error: true
if: >-
diff --git a/dev/tasks/linux-packages/apache-arrow-apt-source/debian/rules b/dev/tasks/linux-packages/apache-arrow-apt-source/debian/rules
index 1e3be48c31..2a3c14c558 100755
--- a/dev/tasks/linux-packages/apache-arrow-apt-source/debian/rules
+++ b/dev/tasks/linux-packages/apache-arrow-apt-source/debian/rules
@@ -12,10 +12,12 @@ export DH_OPTIONS
override_dh_auto_build:
gpg \
--no-default-keyring \
+ --homedir /tmp \
--keyring ./apache-arrow-apt-source.kbx \
--import KEYS
gpg \
--no-default-keyring \
+ --homedir /tmp \
--keyring ./apache-arrow-apt-source.kbx \
--armor \
--export > apache-arrow-apt-source.asc
diff --git a/dev/tasks/linux-packages/apache-arrow/apt/debian-trixie/Dockerfile b/dev/tasks/linux-packages/apache-arrow/apt/debian-trixie/Dockerfile
index 257d005656..3c3c3a3ad9 100644
--- a/dev/tasks/linux-packages/apache-arrow/apt/debian-trixie/Dockerfile
+++ b/dev/tasks/linux-packages/apache-arrow/apt/debian-trixie/Dockerfile
@@ -39,6 +39,7 @@ RUN \
apt install -y -V ${quiet} \
base-files \
build-essential \
+ ccache \
clang \
cmake \
debhelper \
diff --git a/dev/tasks/linux-packages/apt/build.sh b/dev/tasks/linux-packages/apt/build.sh
index bc4c61e622..aa7ed976aa 100755
--- a/dev/tasks/linux-packages/apt/build.sh
+++ b/dev/tasks/linux-packages/apt/build.sh
@@ -48,8 +48,9 @@ architecture=$(dpkg-architecture -q DEB_BUILD_ARCH)
debuild_options=()
dpkg_buildpackage_options=(-us -uc)
-run mkdir -p /build
-run cd /build
+build_root_dir="/build"
+run mkdir -p "${build_root_dir}"
+run pushd "${build_root_dir}"
find . -not -path ./ccache -a -not -path "./ccache/*" -delete
if which ccache > /dev/null 2>&1; then
export CCACHE_COMPILERCHECK=content
@@ -67,6 +68,8 @@ if which ccache > /dev/null 2>&1; then
debuild_options+=(--prepend-path=/usr/lib/ccache)
fi
fi
+build_dir=$(mktemp --directory --tmpdir="${build_root_dir}" package.XXXXX)
+run pushd "${build_dir}"
run cp /host/tmp/${PACKAGE}-${VERSION}.tar.gz \
${PACKAGE}_${VERSION}.orig.tar.gz
run tar xfz ${PACKAGE}_${VERSION}.orig.tar.gz
@@ -80,7 +83,7 @@ case "${VERSION}" in
${PACKAGE}-${VERSION}
;;
esac
-run cd ${PACKAGE}-${VERSION}/
+run pushd ${PACKAGE}-${VERSION}/
platform="${distribution}-${code_name}"
if [ -d "/host/tmp/debian.${platform}-${architecture}" ]; then
run cp -rp "/host/tmp/debian.${platform}-${architecture}" debian
@@ -102,7 +105,7 @@ df -h
if which ccache > /dev/null 2>&1; then
ccache --show-stats --verbose || :
fi
-run cd -
+run popd
repositories="/host/repositories"
package_initial=$(echo "${PACKAGE}" | sed -e 's/\(.\).*/\1/')
@@ -116,3 +119,7 @@ run \
-exec cp '{}' "${pool_dir}/" ';'
run chown -R "$(stat --format "%u:%g" "${repositories}")" "${repositories}"
+run find "${repositories}"
+
+run popd
+rm -rf "${build_dir}"
diff --git a/dev/tasks/linux-packages/package-task.rb b/dev/tasks/linux-packages/package-task.rb
index 4096c89463..d964a52dd3 100644
--- a/dev/tasks/linux-packages/package-task.rb
+++ b/dev/tasks/linux-packages/package-task.rb
@@ -150,7 +150,9 @@ class PackageTask
end
pass_through_env_names = [
"DEB_BUILD_OPTIONS",
+ "HOME",
"RPM_BUILD_NCPUS",
+ "TZ",
]
pass_through_env_names.each do |name|
value = ENV[name]
@@ -188,7 +190,7 @@ class PackageTask
run_command_line << image
run_command_line << "/host/build.sh" unless console
- sh(*build_command_line)
+ sh(*build_command_line) if Dir.exist?(ENV["HOME"])
sh(*run_command_line)
end
(I'm trying this on local but it doesn't work yet. Sorry...) |
This is fair, it would solve us requiring to patch reprotest as I think we might also want to fix (maybe separately) a couple of issues around the build_path detected when running on the docker container, example using I might open a PR for those separately. Thanks for your help taking a look at this ! |
TBD
This is just a testing PR at the moment to validate a CI job. There's still work to be done.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)