Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the experimental_use_remote_cache_for_cache_unaware_spawns flag #18944

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jmmv
Copy link
Contributor

@jmmv jmmv commented Jul 14, 2023

This new flag is similar in spirit to --remote_accept_cached but allows being more selective about what's accepted and what's not.

The specific problem I face is the following: we have a setup where we want to use dynamic execution for performance reasons. However, we know some actions in our build (those run by rules_foreign_cc) are not deterministic. To mitigate this, we force the actions that we know are not deterministic to run remotely, without dynamic execution, as this will prevent exposing the non-determinism for as long as they are cached and until we can fix their problems.

However, we still observe non-deterministic actions in the build and we need to diagnose what those are. To do this, I need to run two builds and compare their execlogs. And I need these builds to continue to reuse the non-deterministic artifacts we already know about from the cache, but to rerun other local actions from scratch.

Unfortunately, the fact that "remote-cache" is not a strategy (see #18245) makes this very difficult to do because, even if I configure certain actions to run locally unconditionally, the spawn strategy insists on checking the remote cache for them.

With this new flag, I can run a build where the remote actions remain remote but where I disable the dynamic scheduler and force the remaining actions to re-run locally. I'm marking the flag as experimental because this feels like a huge kludge to paper over the fact that the remote cache should really be a strategy, but isn't. In other words: this flag should go away with a better rearchitecting of the remote caching interface.

@jmmv jmmv force-pushed the allow-remote-cache branch 3 times, most recently from 5cb25b0 to 678f97c Compare July 14, 2023 18:00
This new flag is similar in spirit to --remote_accept_cached but allows
being more selective about what's accepted and what's not.

The specific problem I face is the following: we have a setup where we
want to use dynamic execution for performance reasons.  However, we know
some actions in our build (those run by rules_foreign_cc) are not
deterministic.  To mitigate this, we force the actions that we know are
not deterministic to run remotely, without dynamic execution, as this
will prevent exposing the non-determinism for as long as they are cached
and until we can fix their problems.

However, we still observe non-deterministic actions in the build and we
need to diagnose what those are.  To do this, I need to run two builds
and compare their execlogs.  And I need these builds to continue to
reuse the non-deterministic artifacts we _already_ know about from the
cache, but to rerun other local actions from scratch.

Unfortunately, the fact that "remote-cache" is not a strategy (see
bazelbuild#18245) makes this very
difficult to do because, even if I configure certain actions to run
locally unconditionally, the spawn strategy insists on checking the
remote cache for them.

With this new flag, I can run a build where the remote actions remain
remote but where I disable the dynamic scheduler and force the remaining
actions to re-run locally.  I'm marking the flag as experimental because
this feels like a huge kludge to paper over the fact that the remote
cache should really be a strategy, but isn't.  In other words: this
flag should go away with a better rearchitecting of the remote caching
interface.
@jmmv jmmv marked this pull request as ready for review July 14, 2023 21:43
@jmmv jmmv requested a review from a team as a code owner July 14, 2023 21:43
@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Local-Exec Issues and PRs for the Execution (Local) team team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Jul 14, 2023
sfc-gh-mgalindo added a commit to Snowflake-Labs/bazel that referenced this pull request Jun 10, 2024
    This new flag is similar in spirit to --remote_accept_cached but allows
    being more selective about what's accepted and what's not.

    The specific problem I face is the following: we have a setup where we
    want to use dynamic execution for performance reasons.  However, we know
    some actions in our build (those run by rules_foreign_cc) are not
    deterministic.  To mitigate this, we force the actions that we know are
    not deterministic to run remotely, without dynamic execution, as this
    will prevent exposing the non-determinism for as long as they are cached
    and until we can fix their problems.

    However, we still observe non-deterministic actions in the build and we
    need to diagnose what those are.  To do this, I need to run two builds
    and compare their execlogs.  And I need these builds to continue to
    reuse the non-deterministic artifacts we _already_ know about from the
    cache, but to rerun other local actions from scratch.

    Unfortunately, the fact that "remote-cache" is not a strategy (see
    bazelbuild#18245) makes this very
    difficult to do because, even if I configure certain actions to run
    locally unconditionally, the spawn strategy insists on checking the
    remote cache for them.

    With this new flag, I can run a build where the remote actions remain
    remote but where I disable the dynamic scheduler and force the remaining
    actions to re-run locally.  I'm marking the flag as experimental because
    this feels like a huge kludge to paper over the fact that the remote
    cache should really be a strategy, but isn't.  In other words: this
    flag should go away with a better rearchitecting of the remote caching
    interface.

Upstream PR: bazelbuild#18944
Author: Julio Merino <[email protected]>
Date:   Fri Jul 14 10:32:41 2023 -0700

Description

Testing
sfc-gh-mgalindo added a commit to Snowflake-Labs/bazel that referenced this pull request Jun 20, 2024
    This new flag is similar in spirit to --remote_accept_cached but allows
    being more selective about what's accepted and what's not.

    The specific problem I face is the following: we have a setup where we
    want to use dynamic execution for performance reasons.  However, we know
    some actions in our build (those run by rules_foreign_cc) are not
    deterministic.  To mitigate this, we force the actions that we know are
    not deterministic to run remotely, without dynamic execution, as this
    will prevent exposing the non-determinism for as long as they are cached
    and until we can fix their problems.

    However, we still observe non-deterministic actions in the build and we
    need to diagnose what those are.  To do this, I need to run two builds
    and compare their execlogs.  And I need these builds to continue to
    reuse the non-deterministic artifacts we _already_ know about from the
    cache, but to rerun other local actions from scratch.

    Unfortunately, the fact that "remote-cache" is not a strategy (see
    bazelbuild#18245) makes this very
    difficult to do because, even if I configure certain actions to run
    locally unconditionally, the spawn strategy insists on checking the
    remote cache for them.

    With this new flag, I can run a build where the remote actions remain
    remote but where I disable the dynamic scheduler and force the remaining
    actions to re-run locally.  I'm marking the flag as experimental because
    this feels like a huge kludge to paper over the fact that the remote
    cache should really be a strategy, but isn't.  In other words: this
    flag should go away with a better rearchitecting of the remote caching
    interface.

Upstream PR: bazelbuild#18944
Author: Julio Merino <[email protected]>
Date:   Fri Jul 14 10:32:41 2023 -0700

Description

Testing
sfc-gh-mgalindo added a commit to Snowflake-Labs/bazel that referenced this pull request Jun 21, 2024
    This new flag is similar in spirit to --remote_accept_cached but allows
    being more selective about what's accepted and what's not.

    The specific problem I face is the following: we have a setup where we
    want to use dynamic execution for performance reasons.  However, we know
    some actions in our build (those run by rules_foreign_cc) are not
    deterministic.  To mitigate this, we force the actions that we know are
    not deterministic to run remotely, without dynamic execution, as this
    will prevent exposing the non-determinism for as long as they are cached
    and until we can fix their problems.

    However, we still observe non-deterministic actions in the build and we
    need to diagnose what those are.  To do this, I need to run two builds
    and compare their execlogs.  And I need these builds to continue to
    reuse the non-deterministic artifacts we _already_ know about from the
    cache, but to rerun other local actions from scratch.

    Unfortunately, the fact that "remote-cache" is not a strategy (see
    bazelbuild#18245) makes this very
    difficult to do because, even if I configure certain actions to run
    locally unconditionally, the spawn strategy insists on checking the
    remote cache for them.

    With this new flag, I can run a build where the remote actions remain
    remote but where I disable the dynamic scheduler and force the remaining
    actions to re-run locally.  I'm marking the flag as experimental because
    this feels like a huge kludge to paper over the fact that the remote
    cache should really be a strategy, but isn't.  In other words: this
    flag should go away with a better rearchitecting of the remote caching
    interface.

Upstream PR: bazelbuild#18944
Author: Julio Merino <[email protected]>
Date:   Fri Jul 14 10:32:41 2023 -0700

Description

Testing
sfc-gh-mgalindo added a commit to Snowflake-Labs/bazel that referenced this pull request Jun 26, 2024
    This new flag is similar in spirit to --remote_accept_cached but allows
    being more selective about what's accepted and what's not.

    The specific problem I face is the following: we have a setup where we
    want to use dynamic execution for performance reasons.  However, we know
    some actions in our build (those run by rules_foreign_cc) are not
    deterministic.  To mitigate this, we force the actions that we know are
    not deterministic to run remotely, without dynamic execution, as this
    will prevent exposing the non-determinism for as long as they are cached
    and until we can fix their problems.

    However, we still observe non-deterministic actions in the build and we
    need to diagnose what those are.  To do this, I need to run two builds
    and compare their execlogs.  And I need these builds to continue to
    reuse the non-deterministic artifacts we _already_ know about from the
    cache, but to rerun other local actions from scratch.

    Unfortunately, the fact that "remote-cache" is not a strategy (see
    bazelbuild#18245) makes this very
    difficult to do because, even if I configure certain actions to run
    locally unconditionally, the spawn strategy insists on checking the
    remote cache for them.

    With this new flag, I can run a build where the remote actions remain
    remote but where I disable the dynamic scheduler and force the remaining
    actions to re-run locally.  I'm marking the flag as experimental because
    this feels like a huge kludge to paper over the fact that the remote
    cache should really be a strategy, but isn't.  In other words: this
    flag should go away with a better rearchitecting of the remote caching
    interface.

Upstream PR: bazelbuild#18944
Author: Julio Merino <[email protected]>
Date:   Fri Jul 14 10:32:41 2023 -0700

Description

Testing
sfc-gh-mgalindo added a commit to Snowflake-Labs/bazel that referenced this pull request Jun 28, 2024
    This new flag is similar in spirit to --remote_accept_cached but allows
    being more selective about what's accepted and what's not.

    The specific problem I face is the following: we have a setup where we
    want to use dynamic execution for performance reasons.  However, we know
    some actions in our build (those run by rules_foreign_cc) are not
    deterministic.  To mitigate this, we force the actions that we know are
    not deterministic to run remotely, without dynamic execution, as this
    will prevent exposing the non-determinism for as long as they are cached
    and until we can fix their problems.

    However, we still observe non-deterministic actions in the build and we
    need to diagnose what those are.  To do this, I need to run two builds
    and compare their execlogs.  And I need these builds to continue to
    reuse the non-deterministic artifacts we _already_ know about from the
    cache, but to rerun other local actions from scratch.

    Unfortunately, the fact that "remote-cache" is not a strategy (see
    bazelbuild#18245) makes this very
    difficult to do because, even if I configure certain actions to run
    locally unconditionally, the spawn strategy insists on checking the
    remote cache for them.

    With this new flag, I can run a build where the remote actions remain
    remote but where I disable the dynamic scheduler and force the remaining
    actions to re-run locally.  I'm marking the flag as experimental because
    this feels like a huge kludge to paper over the fact that the remote
    cache should really be a strategy, but isn't.  In other words: this
    flag should go away with a better rearchitecting of the remote caching
    interface.

Upstream PR: bazelbuild#18944
Author: Julio Merino <[email protected]>
Date:   Fri Jul 14 10:32:41 2023 -0700

Description

Testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Local-Exec Issues and PRs for the Execution (Local) team team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant