Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2144] Prevent GenerateWorkUnitsImpl from inadvertently cleaning up temp/staging dirs #4039

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

phet
Copy link
Contributor

@phet phet commented Aug 27, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Whereas the AbstractJobLauncher, used by Gobblin-on-MR, runs the entire job end-to-end, the Gobblin-on-Temporal GenerateWorkUnitsImpl activity solely generates WUs w/o actually processing them. Thus it's far too early to clean anything up in the temp/staging dirs (e.g. task-staging and task-output, used by writers and the commit step).

Even worse, depending upon config, doing cleanup nonetheless could destabilize other job executions, when those share this same temp/staging area. Such cross-job "action-at-a-distance" is unpredictable and especially challenging even to diagnose.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

./gradlew build successful

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@@ -110,6 +110,7 @@ public DagManagementTaskStreamImpl(Config config, Optional<DagActionStore> dagAc
this.dagProcEngineMetrics = dagProcEngineMetrics;
}

@Override
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBD, just noticed this to be missing

@Will-Lo Will-Lo merged commit 444f266 into apache:master Aug 27, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants