Skip to content

py_binary is no longer reproducible between exec and target configs #3793

@BarrettStephen

Description

@BarrettStephen

🐞 bug report

Affected Rule

The issue is caused by the rule: py_binary

Is this a regression?

Yes, the previous version in which this bug was not present was: 1.8.5

Description

We use pkg_tar (1.2.0) to package up py_binary and distribute it in some cases. Sometimes these go into docker images that rules_oci hashes. After upgrading to 2.0.1, I observed that the same py_binary is going into an image built in target or exec ends up with different hashes. I narrowed this down to this commit 352f405
Where build data is being included into the py_binary now via
"CONFIG_MODE": "EXEC" if _is_tool_config(ctx) else "TARGET",

so <target>.build_data.txt is different. This is the only difference between the built py_binaries.

I can't think of a usecase for knowing your build config mode at runtime in the py_binary, and it now makes the py_binary (and anything downstream of it) unreproducible between config and breaks path mapping as well. Can this be removed? Or if its needed for some reason, put behind stamp?

🔬 Minimal Reproduction

Build a py_binary and look in bazel-out for the build_data.txt.

🌍 Your Environment

Operating System:

  
Cent OS 9
5.14.0-687.el9.x86_64
  

Output of bazel version:

  
bazel 8.7.0
  

Rules_python version:

  
2.0.1
  

Anything else relevant?

I made this patch to fix it for us

Subject: [PATCH] Remove CONFIG_MODE from build data generation

Commit 352f405 added CONFIG_MODE to the build data generation environment,
which causes the build data file to differ between exec and target
configurations. This breaks build reproducibility and causes cache misses
when the same py_binary is built in both configurations.

Since the build data file is part of the output, having CONFIG_MODE in the
action environment means the same py_binary target produces different
outputs depending on whether it's in exec or target config, defeating Bazel's
caching and reproducibility guarantees.

This removes CONFIG_MODE from the environment to restore reproducibility.
---
 python/private/py_executable.bzl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/python/private/py_executable.bzl b/python/private/py_executable.bzl
index c2a4a8f8..c03fb2bf 100644
--- a/python/private/py_executable.bzl
+++ b/python/private/py_executable.bzl
@@ -1565,7 +1565,8 @@ def _write_build_data(ctx):
         env = {
             # Include config mode so that binaries can detect if they're
             # being used as a build tool or not, allowing for runtime optimizations.
-            "CONFIG_MODE": "EXEC" if _is_tool_config(ctx) else "TARGET",
+            # NOTE: Disabled to avoid cache misses between exec and target configs
+            # "CONFIG_MODE": "EXEC" if _is_tool_config(ctx) else "TARGET",
             "INFO_FILE": info_file.path if info_file else "",
             "OUTPUT": build_data.path,
             # Include this so it's explicit, otherwise, one has to detect
--
2.52.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions