Skip to content

[SPARK-57482][CORE][TESTS] Fix flaky SparkLauncherSuite.testInProcessLauncher under CI load#56529

Closed
dbtsai wants to merge 1 commit into
apache:masterfrom
dbtsai:dbtsai/fix-flaky-inprocess-launcher
Closed

[SPARK-57482][CORE][TESTS] Fix flaky SparkLauncherSuite.testInProcessLauncher under CI load#56529
dbtsai wants to merge 1 commit into
apache:masterfrom
dbtsai:dbtsai/fix-flaky-inprocess-launcher

Conversation

@dbtsai

@dbtsai dbtsai commented Jun 15, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

SparkLauncherSuite.testInProcessLauncher waits for the connection between the in-process app and the launcher to be established by polling the app handle state with eventually(Duration.ofSeconds(5), Duration.ofMillis(10)). Under heavy CI load this 5-second window is too short: the handle can remain in UNKNOWN for longer, causing the test to fail with:

java.lang.IllegalStateException: Failed check after 476 tries: expected: not equal but was: <UNKNOWN>.
  at org.apache.spark.launcher.BaseSuite.eventually(BaseSuite.java:88)
  at org.apache.spark.launcher.SparkLauncherSuite.inProcessLauncherTestImpl(SparkLauncherSuite.java:162)
  at org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher(SparkLauncherSuite.java:130)

This change increases the timeout to 30 seconds with a 100ms poll interval, consistent with waitForSparkContextShutdown (30s/100ms) and the other eventually calls in this suite (60s/1000ms).

Why are the changes needed?

SparkLauncherSuite.testInProcessLauncher is flaky under CI load. The change only relaxes a test timeout; it does not change production behavior.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test SparkLauncherSuite.testInProcessLauncher.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

…Launcher under CI load

### What changes were proposed in this pull request?

`SparkLauncherSuite.testInProcessLauncher` waits for the connection between
the in-process app and the launcher to be established by polling the app
handle state with `eventually(Duration.ofSeconds(5), Duration.ofMillis(10))`.
Under heavy CI load this 5-second window is too short: the handle can remain
in `UNKNOWN` for longer, causing the test to fail with:

```
java.lang.IllegalStateException: Failed check after 476 tries:
    expected: not equal but was: <UNKNOWN>.
  at org.apache.spark.launcher.BaseSuite.eventually(BaseSuite.java:88)
  at org.apache.spark.launcher.SparkLauncherSuite.inProcessLauncherTestImpl(...)
```

This change increases the timeout to 30 seconds with a 100ms poll interval,
consistent with `waitForSparkContextShutdown` (30s/100ms) and the other
`eventually` calls in this suite (60s/1000ms).

### Why are the changes needed?

`SparkLauncherSuite.testInProcessLauncher` is flaky under CI load. The fix
only relaxes a test timeout; it does not change production behavior.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test `SparkLauncherSuite.testInProcessLauncher`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Co-authored-by: Isaac
@dbtsai dbtsai closed this in 1cf08e6 Jun 16, 2026
dbtsai added a commit that referenced this pull request Jun 16, 2026
…auncher under CI load

### What changes were proposed in this pull request?

`SparkLauncherSuite.testInProcessLauncher` waits for the connection between the in-process app and the launcher to be established by polling the app handle state with `eventually(Duration.ofSeconds(5), Duration.ofMillis(10))`. Under heavy CI load this 5-second window is too short: the handle can remain in `UNKNOWN` for longer, causing the test to fail with:

```
java.lang.IllegalStateException: Failed check after 476 tries: expected: not equal but was: <UNKNOWN>.
  at org.apache.spark.launcher.BaseSuite.eventually(BaseSuite.java:88)
  at org.apache.spark.launcher.SparkLauncherSuite.inProcessLauncherTestImpl(SparkLauncherSuite.java:162)
  at org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher(SparkLauncherSuite.java:130)
```

This change increases the timeout to 30 seconds with a 100ms poll interval, consistent with `waitForSparkContextShutdown` (30s/100ms) and the other `eventually` calls in this suite (60s/1000ms).

### Why are the changes needed?

`SparkLauncherSuite.testInProcessLauncher` is flaky under CI load. The change only relaxes a test timeout; it does not change production behavior.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test `SparkLauncherSuite.testInProcessLauncher`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes #56529 from dbtsai/dbtsai/fix-flaky-inprocess-launcher.

Authored-by: DB Tsai <dbtsai@dbtsai.com>
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
(cherry picked from commit 1cf08e6)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
dbtsai added a commit that referenced this pull request Jun 16, 2026
…auncher under CI load

### What changes were proposed in this pull request?

`SparkLauncherSuite.testInProcessLauncher` waits for the connection between the in-process app and the launcher to be established by polling the app handle state with `eventually(Duration.ofSeconds(5), Duration.ofMillis(10))`. Under heavy CI load this 5-second window is too short: the handle can remain in `UNKNOWN` for longer, causing the test to fail with:

```
java.lang.IllegalStateException: Failed check after 476 tries: expected: not equal but was: <UNKNOWN>.
  at org.apache.spark.launcher.BaseSuite.eventually(BaseSuite.java:88)
  at org.apache.spark.launcher.SparkLauncherSuite.inProcessLauncherTestImpl(SparkLauncherSuite.java:162)
  at org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher(SparkLauncherSuite.java:130)
```

This change increases the timeout to 30 seconds with a 100ms poll interval, consistent with `waitForSparkContextShutdown` (30s/100ms) and the other `eventually` calls in this suite (60s/1000ms).

### Why are the changes needed?

`SparkLauncherSuite.testInProcessLauncher` is flaky under CI load. The change only relaxes a test timeout; it does not change production behavior.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test `SparkLauncherSuite.testInProcessLauncher`.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8

Closes #56529 from dbtsai/dbtsai/fix-flaky-inprocess-launcher.

Authored-by: DB Tsai <dbtsai@dbtsai.com>
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
(cherry picked from commit 1cf08e6)
Signed-off-by: DB Tsai <dbtsai@dbtsai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant