Skip to content

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Oct 10, 2025

What changes were proposed in this pull request?

Factor out streaming tests from pyspark-sql and pyspark-connect

Why are the changes needed?

pyspark-sql and pyspark-connect are still prone to timeout, even after we increase timeout-minutes to 150, see https://github.com/apache/spark/actions/runs/18389137953/workflow

image

the streaming tests are large, move them to dedicated testing modules to speed up ci:

Starting test(python3.11): pyspark.sql.tests.pandas.test_pandas_transform_with_state (temp output: /Users/runner/work/spark/spark/python/target/92efa305-098c-4839-8bb4-d13c9b60a405/python3.11__pyspark.sql.tests.pandas.test_pandas_transform_with_state__o7ragjh2.log)
Finished test(python3.11): pyspark.sql.tests.pandas.test_pandas_transform_with_state (1509s) ... 2 tests were skipped
Starting test(python3.11): pyspark.sql.tests.pandas.test_pandas_transform_with_state_checkpoint_v2 (temp output: /Users/runner/work/spark/spark/python/target/929d4ad3-5518-4011-85a9-b14355974ead/python3.11__pyspark.sql.tests.pandas.test_pandas_transform_with_state_checkpoint_v2__m41avtp4.log)
Finished test(python3.11): pyspark.sql.tests.pandas.test_pandas_transform_with_state_checkpoint_v2 (1537s) ... 2 tests were skipped

Does this PR introduce any user-facing change?

no, infra-only

How was this patch tested?

after this change, in PR builder
image

image image

Was this patch authored or co-authored using generative AI tooling?

no

# Conflicts:
#	dev/sparktestsupport/modules.py
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs).

@zhengruifeng zhengruifeng changed the title [SPARK-53861][PYTHON][INFRA] Factor out streaming tests from spark-sql and spark-connect [SPARK-53861][PYTHON][INFRA] Factor out streaming tests from pyspark-sql and pyspark-connect Oct 10, 2025
@dongjoon-hyun
Copy link
Member

Thank you, @zhengruifeng and all. Merged to master.

@zhengruifeng zhengruifeng deleted the infra_ss_module branch October 10, 2025 08:53
zhengruifeng added a commit that referenced this pull request Oct 10, 2025
### What changes were proposed in this pull request?
Restore pyspark execution timeout to 2 hours

### Why are the changes needed?
after #52564, 2 hours should be enough for pyspark execution

### Does this PR introduce _any_ user-facing change?
No, infra-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #52570 from zhengruifeng/restore_120.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants