[JENKINS-73824] Wait for Pipeline builds to complete before allowing their jobs to be deleted #9790

dwnusbaum · 2024-09-26T18:15:46Z

See JENKINS-73824. This might qualify as a major bug, although it has existed for a long time and is not a regression as far as I can see.

While investigating an issue with branch indexing for multibranch projects leaving build directories behind when deleting projects, I think I found a rather severe issue with job deletion in general for Pipelines. Since its introduction in #2789, the logic for cancelling ongoing builds and waiting for them to complete when deleting their parent job has checked Thread.isAlive rather than Executor.isActive, which is not correct for asynchronous tasks such as the main Pipeline execution. See the Javadoc here. Oleg actually suggested changing to isActive in the original PR here for other reasons.

The result is that although ongoing Pipeline builds are interrupted, ItemDeletion.cancelBuildsInProgress does not wait for those builds to fully complete, which can at least lead to files being written back into the job directory that was just deleted. There are probably other more exotic issues possible for Pipelines that do not shut down quickly when interrupted.

Testing done

Tested against jenkinsci/workflow-job-plugin#468 to check that it fixes the issue. I added a test here in 3e27ac6, but it required a lot of boilerplate and was not particularly realistic, so I removed it after discussion in #9790 (comment).

Proposed changelog entries

Wait for ongoing Pipeline builds to fully complete before allowing their parent job to be deleted.

Proposed upgrade guidelines

N/A

Submitter checklist

Give feedback

The Jira issue, if it exists, is well-described.
The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood (see examples). Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
There is automated testing or an explanation as to why this change has no tests.
New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
New or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives (see documentation).
For dependency updates, there are links to external changelogs and, if possible, full differentials.
For new APIs and extension points, there is a link to at least one consumer.
Options

Desired reviewers

@mention

Before the changes are marked as ready-for-merge:

Maintainer checklist

Give feedback

There are at least two (2) approvals for the pull request and no outstanding requests for change.
Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
Proper changelog labels are set so that the changelog can be generated automatically.
If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).
Options

… to add a regression test in core

test/src/test/java/jenkins/model/queue/ItemDeletionTest.java

…on.cancelBuildsInProgress

…ent)

dwnusbaum · 2024-09-26T22:17:49Z

Moving back to draft until I have time to check 8cebb0f more thoroughly.

dwnusbaum · 2024-09-26T22:22:57Z

core/src/main/java/jenkins/model/queue/ItemDeletion.java

-                    // I don't know why, but we have to keep interrupting
-                    entry.getKey().interrupt(Result.ABORTED);


I removed this after some discussion in jenkinsci/workflow-job-plugin#468 (comment). The fact that we were repeatedly interrupting things here was a bit unusual. Other things in Jenkins that interrupt builds just call interrupt a single time.

The extra interruptions were added in 047e849, but the test in that commit passes even without the extra interruptions. For Pipelines, which are the only jobs I know of where interrupt may not kill the job promptly, repeatedly calling interrupt is generally not going to help (barring things like poorly written try/catch blocks). You have to move to more severe methods such as doTerm and doKill (perhaps we should consider making Pipeline do this on its own) to guarantee interruption. Now that Pipelines are handled correctly and we don't always remove them from the iterator on the first iteration, the repeated interruptions every 50ms could produce a large number of timer tasks here for Pipelines that take a while to complete, whereas previously a single Pipeline would have been interrupted at most twice.

For Pipelines, the main issue is that users are very easily able to inadvertently swallow interruptions just by adding a try/catch block to their Pipeline script. FlowInterruptedException.actualInterruption makes it possible to avoid swallowing interruptions in trusted scripts and from Pipeline steps, but I would guess there are many steps that do not handle this correctly, and sandboxed scripts do not have access to this field by default.

For non-Pipelines, poorly written Builder and Publisher implementations may also swallow interruptions, but I have no idea how common this is.

What do other people think about this? I see 3 main options:

Interrupt builds only once (as in this PR now) - if the build doesn't respond to interruption, someone will have to kill it some other way or retry deletion until the build finally dies. This is how interruption works if you try to cancel a build in the Jenkins UI.

Interrupt builds repeatedly in a loop every 50ms until they die (as before this PR) - Best chance of interrupting the build, but is not aligned with other interruption mechanisms in Jenkins, and is still not guaranteed to cancel builds in all cases. In combination with the main fix here, this can lead to a large number of timer tasks being created when interrupting Pipeline builds, which seems undesirable. For example, if a build takes the full 15s to die, at 50ms per interrupt we will try to interrupt the build 300 times.

Interrupt builds repeatedly, but at a much lower frequency, say once per second. This would make this code able to cancel builds that swallow a single interruption due to something simple like a try/catch block in a Pipeline, but would avoid overloading the Pipeline interruption mechanism.

First option looks fine to me. Users will try to interrupt if they see nothing happens.

The first option seems appropriate for purposes of this PR; probably WorkflowRun (and/or CpsFlowExecution) should be independently improved to enforce an exit from an actualInterruption in a timely manner.

MarkEWaite · 2024-10-05T19:23:45Z

This PR is now ready for merge. We will merge it after approximately 24 hours if there is no negative feedback.

/label ready-for-merge

…their jobs to be deleted (jenkinsci#9790) * Wait for Pipelines to complete before allowing their jobs to be deleted * Create mock Job/Run classes that use AsynchronousExecution to be able to add a regression test in core * [JENKINS-73824] Do not repeatedly interrupt executables in ItemDeletion.cancelBuildsInProgress * [JENKINS-73824] Delete ItemDeletionTest based on jenkinsci#9790 (comment) (cherry picked from commit 4d7b993)

Wait for Pipelines to complete before allowing their jobs to be deleted

9eab23f

dwnusbaum changed the title ~~Wait for Pipelines to complete before allowing their jobs to be deleted~~ Wait for Pipeline builds to complete before allowing their jobs to be deleted Sep 26, 2024

Create mock Job/Run classes that use AsynchronousExecution to be able…

3e27ac6

… to add a regression test in core

dwnusbaum commented Sep 26, 2024

View reviewed changes

test/src/test/java/jenkins/model/queue/ItemDeletionTest.java Outdated Show resolved Hide resolved

dwnusbaum commented Sep 26, 2024

View reviewed changes

test/src/test/java/jenkins/model/queue/ItemDeletionTest.java Outdated Show resolved Hide resolved

dwnusbaum added the bug For changelog: Minor bug. Will be listed after features label Sep 26, 2024

dwnusbaum changed the title ~~Wait for Pipeline builds to complete before allowing their jobs to be deleted~~ [JENKINS-73824] Wait for Pipeline builds to complete before allowing their jobs to be deleted Sep 26, 2024

dwnusbaum marked this pull request as ready for review September 26, 2024 20:05

dwnusbaum requested a review from jglick September 26, 2024 20:05

dwnusbaum mentioned this pull request Sep 26, 2024

[JENKINS-73824] Add test for deleting a Pipeline job while one of its builds is running only on a OneOffExecutor jenkinsci/workflow-job-plugin#468

Closed

6 tasks

jglick approved these changes Sep 26, 2024

View reviewed changes

test/src/test/java/jenkins/model/queue/ItemDeletionTest.java Outdated Show resolved Hide resolved

dwnusbaum added 2 commits September 26, 2024 18:15

[JENKINS-73824] Do not repeatedly interrupt executables in ItemDeleti…

8cebb0f

…on.cancelBuildsInProgress

[JENKINS-73824] Delete ItemDeletionTest based on jenkinsci#9790 (comm…

6ec50c7

…ent)

dwnusbaum marked this pull request as draft September 26, 2024 22:17

dwnusbaum commented Sep 26, 2024

View reviewed changes

dwnusbaum marked this pull request as ready for review September 27, 2024 20:24

dwnusbaum requested a review from jglick September 27, 2024 20:31

jgreffe approved these changes Sep 30, 2024

View reviewed changes

jglick approved these changes Sep 30, 2024

View reviewed changes

MarkEWaite self-assigned this Oct 5, 2024

comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Oct 5, 2024

MarkEWaite merged commit 4d7b993 into jenkinsci:master Oct 6, 2024
16 checks passed

dwnusbaum deleted the pipeline-job-deletion branch October 8, 2024 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-73824] Wait for Pipeline builds to complete before allowing their jobs to be deleted #9790

[JENKINS-73824] Wait for Pipeline builds to complete before allowing their jobs to be deleted #9790

dwnusbaum commented Sep 26, 2024 •

edited

Loading

Submitter checklist

Maintainer checklist

dwnusbaum commented Sep 26, 2024

dwnusbaum Sep 26, 2024 •

edited

Loading

dwnusbaum Sep 27, 2024 •

edited

Loading

jgreffe Sep 30, 2024

jglick Sep 30, 2024

MarkEWaite commented Oct 5, 2024

		// I don't know why, but we have to keep interrupting
		entry.getKey().interrupt(Result.ABORTED);

[JENKINS-73824] Wait for Pipeline builds to complete before allowing their jobs to be deleted #9790

[JENKINS-73824] Wait for Pipeline builds to complete before allowing their jobs to be deleted #9790

Conversation

dwnusbaum commented Sep 26, 2024 • edited Loading

Testing done

Proposed changelog entries

Proposed upgrade guidelines

Submitter checklist

Desired reviewers

Maintainer checklist

dwnusbaum commented Sep 26, 2024

dwnusbaum Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

dwnusbaum Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

jgreffe Sep 30, 2024

Choose a reason for hiding this comment

jglick Sep 30, 2024

Choose a reason for hiding this comment

MarkEWaite commented Oct 5, 2024

dwnusbaum commented Sep 26, 2024 •

edited

Loading

dwnusbaum Sep 26, 2024 •

edited

Loading

dwnusbaum Sep 27, 2024 •

edited

Loading