[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available by gaogaotiantian · Pull Request #55266 · apache/spark

gaogaotiantian · 2026-04-08T22:14:47Z

What changes were proposed in this pull request?

Add a kafka and testcontainers to testing/utils package check
Move all the module-level code of kafka test inside classs
Skip the kafka test when dependency is not available, instead of raising an error
Restore os.environ after the test
Use the testing utility in __main__ instead of old entry

Why are the changes needed?

We don't want a test to fail if optional dependency is not available. It's breaking our CIs - https://github.com/apache/spark/actions/runs/24128422095 . The test itself should not have too much module-level code, and should have minimum side effect on the environment.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Confirmed it was correctly skipped when dependency was not available. CI should confirm whether test is still working properly.

Was this patch authored or co-authored using generative AI tooling?

No.

gaogaotiantian · 2026-04-08T22:15:06Z

@jerrypeng as the original author.

python/pyspark/sql/tests/streaming/kafka_utils.py

viirya

I wonder should we install the dependency and let the pipeline run the tests instead of skipping them?

jerrypeng · 2026-04-09T05:46:39Z

@gaogaotiantian I see the test is passing without problem for other PRs. Why is this run (https://github.com/apache/spark/actions/runs/24128422095) special that cause it to fail to due to missing dependencies?

As @viirya mentioned, can we just install the dependencies instead of skipping the tests?

jerrypeng · 2026-04-09T05:50:17Z

What was the point of the other PR https://github.com/apache/spark/pull/55270/changes ? It didn't solve the issue?

zhengruifeng · 2026-04-09T06:09:23Z

@jerrypeng @viirya
there are special testing envs for different purpose, e.g.

https://github.com/apache/spark/actions/workflows/build_python_minimum.yml to check the mandatory dependencies with old versions;
https://github.com/apache/spark/actions/workflows/build_python_3.12_macos26.yml to test on macos
https://github.com/apache/spark/actions/workflows/build_python_3.12_pandas_3.yml to check the pandas 3 compatibility

you can add the new dependencies in them if it makes sense and can make CI pass

viirya · 2026-04-09T06:28:09Z

I'm okay to skip them on test pipelines with special purposes such as dependencies with old versions, etc.

jerrypeng · 2026-04-09T16:35:25Z

@zhengruifeng thank you for the explanation. I ok with skipping the test for those workflows.

gaogaotiantian · 2026-04-09T18:47:21Z

Yeah as @zhengruifeng mentioned, we have some CIs to make sure pyspark works with minimum dependencies. In general, we need the test to work (pass or skip) without optional requirements. A developer without kafka installed should not get a test error when they try to run a big test suite including the kafka test.

More importantly, if you are using unittest or pytest, rather than the run-tests script which actually executes the module directly by python -m test_module, the testing framework will discover tests by importing the testing modules - and module-level code will just raise an error to stop the discovery.

So, we either to make these two testing module "required", which is very rare in pyspark case, or we need to skip the test if they don't exist.

zhengruifeng · 2026-04-10T02:18:29Z

thanks all, merged to master

Refactor kafka test so it won't break CI

873335c

HyukjinKwon approved these changes Apr 9, 2026

View reviewed changes

HeartSaVioR mentioned this pull request Apr 9, 2026

[SPARK-55306][PYTHON][TESTS][FOLLOW-UP] Skip Kafka streaming RTM tests when dependencies are not installed #55270

Closed

zhengruifeng reviewed Apr 9, 2026

View reviewed changes

python/pyspark/sql/tests/streaming/kafka_utils.py Outdated Show resolved Hide resolved

viirya reviewed Apr 9, 2026

View reviewed changes

Use the convention for package check

9f15d55

zhengruifeng approved these changes Apr 10, 2026

View reviewed changes

zhengruifeng closed this in 6243211 Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266
gaogaotiantian wants to merge 2 commits intoapache:masterfrom
gaogaotiantian:refactor-kafka-tests

gaogaotiantian commented Apr 8, 2026 •

edited

Loading

Uh oh!

gaogaotiantian commented Apr 8, 2026

Uh oh!

Uh oh!

viirya left a comment

Uh oh!

jerrypeng commented Apr 9, 2026 •

edited

Loading

Uh oh!

jerrypeng commented Apr 9, 2026

Uh oh!

zhengruifeng commented Apr 9, 2026

Uh oh!

viirya commented Apr 9, 2026 •

edited

Loading

Uh oh!

jerrypeng commented Apr 9, 2026

Uh oh!

gaogaotiantian commented Apr 9, 2026

Uh oh!

zhengruifeng commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

gaogaotiantian commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Apr 8, 2026

Uh oh!

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

jerrypeng commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrypeng commented Apr 9, 2026

Uh oh!

zhengruifeng commented Apr 9, 2026

Uh oh!

viirya commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrypeng commented Apr 9, 2026

Uh oh!

gaogaotiantian commented Apr 9, 2026

Uh oh!

zhengruifeng commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gaogaotiantian commented Apr 8, 2026 •

edited

Loading

jerrypeng commented Apr 9, 2026 •

edited

Loading

viirya commented Apr 9, 2026 •

edited

Loading