Add test coverage for DSv2 table refresh and pinning design doc#55033
Draft
longvu-db wants to merge 4 commits intoapache:masterfrom
Draft
Add test coverage for DSv2 table refresh and pinning design doc#55033longvu-db wants to merge 4 commits intoapache:masterfrom
longvu-db wants to merge 4 commits intoapache:masterfrom
Conversation
33 new tests covering gaps identified in the "Refreshing and pinning tables in Spark" design doc across all 5 sections: - Section 1: Temp views with stored plans (drop+add column same/different type, type widening, external changes, multiple column additions, subquery refresh, filter pushdown) - Section 2: Repeated table access regression tests (external data writes, schema changes, drop/recreate) - Section 3: Incrementally constructed queries (join with drop+add column, three-way join with version drift) - Section 4: Dataset show/collect consistency (QE reuse behavior, schema changes via external catalog API, interleaved actions) - Section 5: CACHE TABLE scenarios (external schema pinning, session write re-pinning, REFRESH TABLE, external drop/recreate) - Edge cases: nested views, self-join version alignment, REFRESH no-op for DataFrame temp views, cached temp view invalidation Co-authored-by: Isaac
62e3e76 to
ed536cf
Compare
Member
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Please file a JIRA issue to have a proper JIRA ID before converting this back from Draft status, @longvu-db .
Add two new test suites covering all scenarios from the DSv2 table refresh and pinning design doc, plus beyond-doc real-world patterns: DataSourceV2ConcurrencyRefreshSuite (299 tests, classic mode): - 8 modification types x 8 access patterns (parameterized) - True multi-threaded concurrency tests (2-thread, multi-reader, phase-locked, stress with 8+ threads) - All incremental query patterns: union, except, intersect, self-union, chained transformations, cross-table joins, cross-join, left/anti join, subqueries - Cache pinning: external (catalog API) vs session (SQL) - Compound modifications, edge cases, coverage gap tests - Beyond-doc scenarios: spark.read.table(), spark.catalog.refreshTable(), cached derived queries, same-name-different-namespace, nested views, partitioned tables, scalar/EXISTS subqueries, table properties, nullability changes, EXPLAIN on stale DF DataSourceV2RefreshConnectSuite (111 tests, Spark Connect mode): - Full parameterized coverage matching classic suite structure - Verifies Connect-specific behaviors: no stale QE, count/collect consistency, type widening and column rename succeed via re-analysis, all set operations re-analyze both sides Co-authored-by: Isaac
17 new tests derived from design doc review comments by Bart Samwel, Julek Sompolski, Ryan Johnson, and Daniel Weeks: - DF temp view vs SQL temp view behavioral differences - Write transactions never use cache (CTAS reads fresh data) - Read vs write mode: query allows new fields, command fails - Refresh validates ALL tables in plan (not just mismatched) - Session writes immediately visible in next read - Monotonic version advancement in sequential reads - SQL self-join gets consistent version via relation cache - SQL with 3 refs to same table: all consistent - Same table via view + direct + subquery in one query - Join with column addition: schema preservation in classic - show/count/head/take create new QE vs collect reuses stale - Session DROP + CREATE: next read sees new table - Cached table: session write visible, external pinned Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
33 new tests covering gaps identified in the "Refreshing and pinning tables in Spark" design doc across all 5 sections:
Co-authored-by: Isaac
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?