Add test coverage for DSv2 table refresh and pinning design doc by longvu-db · Pull Request #55033 · apache/spark

longvu-db · 2026-03-26T12:34:16Z

33 new tests covering gaps identified in the "Refreshing and pinning tables in Spark" design doc across all 5 sections:

Section 1: Temp views with stored plans (drop+add column same/different type, type widening, external changes, multiple column additions, subquery refresh, filter pushdown)
Section 2: Repeated table access regression tests (external data writes, schema changes, drop/recreate)
Section 3: Incrementally constructed queries (join with drop+add column, three-way join with version drift)
Section 4: Dataset show/collect consistency (QE reuse behavior, schema changes via external catalog API, interleaved actions)
Section 5: CACHE TABLE scenarios (external schema pinning, session write re-pinning, REFRESH TABLE, external drop/recreate)
Edge cases: nested views, self-join version alignment, REFRESH no-op for DataFrame temp views, cached temp view invalidation

Co-authored-by: Isaac

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

33 new tests covering gaps identified in the "Refreshing and pinning tables in Spark" design doc across all 5 sections: - Section 1: Temp views with stored plans (drop+add column same/different type, type widening, external changes, multiple column additions, subquery refresh, filter pushdown) - Section 2: Repeated table access regression tests (external data writes, schema changes, drop/recreate) - Section 3: Incrementally constructed queries (join with drop+add column, three-way join with version drift) - Section 4: Dataset show/collect consistency (QE reuse behavior, schema changes via external catalog API, interleaved actions) - Section 5: CACHE TABLE scenarios (external schema pinning, session write re-pinning, REFRESH TABLE, external drop/recreate) - Edge cases: nested views, self-join version alignment, REFRESH no-op for DataFrame temp views, cached temp view invalidation Co-authored-by: Isaac

dongjoon-hyun

Please file a JIRA issue to have a proper JIRA ID before converting this back from Draft status, @longvu-db .

Add two new test suites covering all scenarios from the DSv2 table refresh and pinning design doc, plus beyond-doc real-world patterns: DataSourceV2ConcurrencyRefreshSuite (299 tests, classic mode): - 8 modification types x 8 access patterns (parameterized) - True multi-threaded concurrency tests (2-thread, multi-reader, phase-locked, stress with 8+ threads) - All incremental query patterns: union, except, intersect, self-union, chained transformations, cross-table joins, cross-join, left/anti join, subqueries - Cache pinning: external (catalog API) vs session (SQL) - Compound modifications, edge cases, coverage gap tests - Beyond-doc scenarios: spark.read.table(), spark.catalog.refreshTable(), cached derived queries, same-name-different-namespace, nested views, partitioned tables, scalar/EXISTS subqueries, table properties, nullability changes, EXPLAIN on stale DF DataSourceV2RefreshConnectSuite (111 tests, Spark Connect mode): - Full parameterized coverage matching classic suite structure - Verifies Connect-specific behaviors: no stale QE, count/collect consistency, type widening and column rename succeed via re-analysis, all set operations re-analyze both sides Co-authored-by: Isaac

17 new tests derived from design doc review comments by Bart Samwel, Julek Sompolski, Ryan Johnson, and Daniel Weeks: - DF temp view vs SQL temp view behavioral differences - Write transactions never use cache (CTAS reads fresh data) - Read vs write mode: query allows new fields, command fails - Refresh validates ALL tables in plan (not just mismatched) - Session writes immediately visible in next read - Monotonic version advancement in sequential reads - SQL self-join gets consistent version via relation cache - SQL with 3 refs to same table: all consistent - Same table via view + direct + subquery in one query - Join with column addition: schema preservation in classic - show/count/head/take create new QE vs collect reuses stale - Session DROP + CREATE: next read sees new table - Cached table: session write visible, external pinned Co-authored-by: Isaac

longvu-db force-pushed the dsv2-refresh-pinning-test-coverage branch from 62e3e76 to ed536cf Compare March 26, 2026 13:06

dongjoon-hyun marked this pull request as draft March 26, 2026 17:08

dongjoon-hyun reviewed Mar 26, 2026

View reviewed changes

longvu-db added 3 commits April 9, 2026 11:49

Add tests

433667a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test coverage for DSv2 table refresh and pinning design doc#55033

Add test coverage for DSv2 table refresh and pinning design doc#55033
longvu-db wants to merge 4 commits intoapache:masterfrom
longvu-db:dsv2-refresh-pinning-test-coverage

longvu-db commented Mar 26, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

longvu-db commented Mar 26, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants