Skip to content

Add test coverage for DSv2 table refresh and pinning design doc#55033

Draft
longvu-db wants to merge 4 commits intoapache:masterfrom
longvu-db:dsv2-refresh-pinning-test-coverage
Draft

Add test coverage for DSv2 table refresh and pinning design doc#55033
longvu-db wants to merge 4 commits intoapache:masterfrom
longvu-db:dsv2-refresh-pinning-test-coverage

Conversation

@longvu-db
Copy link
Copy Markdown
Contributor

33 new tests covering gaps identified in the "Refreshing and pinning tables in Spark" design doc across all 5 sections:

  • Section 1: Temp views with stored plans (drop+add column same/different type, type widening, external changes, multiple column additions, subquery refresh, filter pushdown)
  • Section 2: Repeated table access regression tests (external data writes, schema changes, drop/recreate)
  • Section 3: Incrementally constructed queries (join with drop+add column, three-way join with version drift)
  • Section 4: Dataset show/collect consistency (QE reuse behavior, schema changes via external catalog API, interleaved actions)
  • Section 5: CACHE TABLE scenarios (external schema pinning, session write re-pinning, REFRESH TABLE, external drop/recreate)
  • Edge cases: nested views, self-join version alignment, REFRESH no-op for DataFrame temp views, cached temp view invalidation

Co-authored-by: Isaac

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

33 new tests covering gaps identified in the "Refreshing and pinning
tables in Spark" design doc across all 5 sections:

- Section 1: Temp views with stored plans (drop+add column same/different
  type, type widening, external changes, multiple column additions,
  subquery refresh, filter pushdown)
- Section 2: Repeated table access regression tests (external data writes,
  schema changes, drop/recreate)
- Section 3: Incrementally constructed queries (join with drop+add column,
  three-way join with version drift)
- Section 4: Dataset show/collect consistency (QE reuse behavior, schema
  changes via external catalog API, interleaved actions)
- Section 5: CACHE TABLE scenarios (external schema pinning, session write
  re-pinning, REFRESH TABLE, external drop/recreate)
- Edge cases: nested views, self-join version alignment, REFRESH no-op
  for DataFrame temp views, cached temp view invalidation

Co-authored-by: Isaac
@longvu-db longvu-db force-pushed the dsv2-refresh-pinning-test-coverage branch from 62e3e76 to ed536cf Compare March 26, 2026 13:06
@dongjoon-hyun dongjoon-hyun marked this pull request as draft March 26, 2026 17:08
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file a JIRA issue to have a proper JIRA ID before converting this back from Draft status, @longvu-db .

Add two new test suites covering all scenarios from the DSv2 table
refresh and pinning design doc, plus beyond-doc real-world patterns:

DataSourceV2ConcurrencyRefreshSuite (299 tests, classic mode):
- 8 modification types x 8 access patterns (parameterized)
- True multi-threaded concurrency tests (2-thread, multi-reader,
  phase-locked, stress with 8+ threads)
- All incremental query patterns: union, except, intersect,
  self-union, chained transformations, cross-table joins,
  cross-join, left/anti join, subqueries
- Cache pinning: external (catalog API) vs session (SQL)
- Compound modifications, edge cases, coverage gap tests
- Beyond-doc scenarios: spark.read.table(), spark.catalog.refreshTable(),
  cached derived queries, same-name-different-namespace, nested views,
  partitioned tables, scalar/EXISTS subqueries, table properties,
  nullability changes, EXPLAIN on stale DF

DataSourceV2RefreshConnectSuite (111 tests, Spark Connect mode):
- Full parameterized coverage matching classic suite structure
- Verifies Connect-specific behaviors: no stale QE, count/collect
  consistency, type widening and column rename succeed via
  re-analysis, all set operations re-analyze both sides

Co-authored-by: Isaac
17 new tests derived from design doc review comments by
Bart Samwel, Julek Sompolski, Ryan Johnson, and Daniel Weeks:

- DF temp view vs SQL temp view behavioral differences
- Write transactions never use cache (CTAS reads fresh data)
- Read vs write mode: query allows new fields, command fails
- Refresh validates ALL tables in plan (not just mismatched)
- Session writes immediately visible in next read
- Monotonic version advancement in sequential reads
- SQL self-join gets consistent version via relation cache
- SQL with 3 refs to same table: all consistent
- Same table via view + direct + subquery in one query
- Join with column addition: schema preservation in classic
- show/count/head/take create new QE vs collect reuses stale
- Session DROP + CREATE: next read sees new table
- Cached table: session write visible, external pinned

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants