Skip to content

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266

Closed
gaogaotiantian wants to merge 2 commits intoapache:masterfrom
gaogaotiantian:refactor-kafka-tests
Closed

[SPARK-56403] Refactor kafka test so it's skipped when dependency is not available#55266
gaogaotiantian wants to merge 2 commits intoapache:masterfrom
gaogaotiantian:refactor-kafka-tests

Conversation

@gaogaotiantian
Copy link
Copy Markdown
Contributor

@gaogaotiantian gaogaotiantian commented Apr 8, 2026

What changes were proposed in this pull request?

  • Add a kafka and testcontainers to testing/utils package check
  • Move all the module-level code of kafka test inside classs
  • Skip the kafka test when dependency is not available, instead of raising an error
  • Restore os.environ after the test
  • Use the testing utility in __main__ instead of old entry

Why are the changes needed?

We don't want a test to fail if optional dependency is not available. It's breaking our CIs - https://github.com/apache/spark/actions/runs/24128422095 . The test itself should not have too much module-level code, and should have minimum side effect on the environment.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Confirmed it was correctly skipped when dependency was not available. CI should confirm whether test is still working properly.

Was this patch authored or co-authored using generative AI tooling?

No.

@gaogaotiantian
Copy link
Copy Markdown
Contributor Author

@jerrypeng as the original author.

Copy link
Copy Markdown
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should we install the dependency and let the pipeline run the tests instead of skipping them?

@jerrypeng
Copy link
Copy Markdown
Contributor

jerrypeng commented Apr 9, 2026

@gaogaotiantian I see the test is passing without problem for other PRs. Why is this run (https://github.com/apache/spark/actions/runs/24128422095) special that cause it to fail to due to missing dependencies?

As @viirya mentioned, can we just install the dependencies instead of skipping the tests?

@jerrypeng
Copy link
Copy Markdown
Contributor

What was the point of the other PR https://github.com/apache/spark/pull/55270/changes ? It didn't solve the issue?

@zhengruifeng
Copy link
Copy Markdown
Contributor

@jerrypeng @viirya
there are special testing envs for different purpose, e.g.

you can add the new dependencies in them if it makes sense and can make CI pass

@viirya
Copy link
Copy Markdown
Member

viirya commented Apr 9, 2026

I'm okay to skip them on test pipelines with special purposes such as dependencies with old versions, etc.

@jerrypeng
Copy link
Copy Markdown
Contributor

@zhengruifeng thank you for the explanation. I ok with skipping the test for those workflows.

@gaogaotiantian
Copy link
Copy Markdown
Contributor Author

Yeah as @zhengruifeng mentioned, we have some CIs to make sure pyspark works with minimum dependencies. In general, we need the test to work (pass or skip) without optional requirements. A developer without kafka installed should not get a test error when they try to run a big test suite including the kafka test.

More importantly, if you are using unittest or pytest, rather than the run-tests script which actually executes the module directly by python -m test_module, the testing framework will discover tests by importing the testing modules - and module-level code will just raise an error to stop the discovery.

So, we either to make these two testing module "required", which is very rare in pyspark case, or we need to skip the test if they don't exist.

@zhengruifeng
Copy link
Copy Markdown
Contributor

thanks all, merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants