Skip to content

[SPARK-55441][SQL] Types Framework - Phase 1c - Client Integration#54905

Open
davidm-db wants to merge 19 commits intoapache:masterfrom
davidm-db:davidm-db/types_framework_1c
Open

[SPARK-55441][SQL] Types Framework - Phase 1c - Client Integration#54905
davidm-db wants to merge 19 commits intoapache:masterfrom
davidm-db:davidm-db/types_framework_1c

Conversation

@davidm-db
Copy link
Copy Markdown
Contributor

@davidm-db davidm-db commented Mar 19, 2026

What changes were proposed in this pull request?

Extends the Types Framework (Phase 1a: #54223) with optional traits for client-facing infrastructure dispatch. Adds 4 new traits and integrates 12 files so that new framework-managed types get client support automatically — zero edits to infrastructure files.

New traits:

  • CatalystTypeOps (catalyst) — serializer/deserializer expression building, Arrow field writers
  • ClientTypeOps (sql/api) — Arrow conversion, Python interop, Hive formatting, Thrift type ID
  • ProtoTypeOps (connect/common) — Spark Connect proto DataType and Literal conversions
  • ConnectArrowTypeOps (connect/common) — Connect Arrow serde (no feature flag — client must handle whatever the server sends)

TimeType serves as the reference implementation. One new per-type class (TimeTypeConnectOps) handles both Connect traits; the existing TimeTypeApiOps and TimeTypeOps gain mix-ins for ClientTypeOps and CatalystTypeOps respectively.

Integration pattern: Every integration point uses XxxTypeOps(dt).map(_.method).getOrElse(methodDefault(...)) — framework dispatch first, original code extracted to a private *Default helper as fallback. This minimizes diff on existing code (match bodies stay at their original indentation in the helper).

Trait method design: Methods that may need extra context for future complex types use overloading — a simple abstract version (most types override this) and a concrete extended version with additional parameters that delegates to the simple one by default. For example, formatExternal(value) for most types, with formatExternal(value, nested) available for types that need nesting-aware formatting.

Visibility changes:

  • ArrowFieldWriter and TimeWriter widened from private[arrow] to private[sql] so CatalystTypeOps can declare/instantiate them.
  • ArrowSerializer.FieldSerializer, ArrowVectorReader, and TimeVectorReader widened from private[arrow] to private[connect] so ConnectArrowTypeOps can use proper return types instead of Any.

Why are the changes needed?

Without this, adding a new data type requires manually editing 13+ client integration files with diverse patterns. With these traits, a new type implements the relevant ops methods and all client infrastructure dispatches through the framework automatically.

JDBC Data Source is intentionally excluded from this PR. JDBC type mapping (JdbcUtils) requires types from sql/core that ClientTypeOps (in sql/api) can't reference, and partial support (DDL without read/write) would be worse than no support. A separate JdbcTypeOps trait in sql/core is planned to cover all four JDBC methods (getCommonJDBCType, getCatalystType, makeGetter, makeSetter) as a complete unit.

Does this PR introduce any user-facing change?

No. The framework is gated behind spark.sql.types.framework.enabled (default false in production, true in tests). Existing behavior is unchanged — all original TimeType branches remain as fallback in the *Default methods.

How was this patch tested?

The feature flag is enabled by default in test environments (Utils.isTesting), so the entire existing test suite validates the framework code path. No new tests are added in this PR because the framework delegates to the same underlying logic that existing tests already cover.

In subsequent phases, the testing focus will be on:

  • Testing the framework itself (Ops interface contracts, roundtrip correctness, edge cases)
  • Designing a generalized testing mechanism that enforces proper test coverage for each type added through the framework

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6 (claude-opus-4-6)

@davidm-db davidm-db changed the title supporting various clients thorugh types framework [SPARK-55441][SQL] Types Framework - Phase 1c - Client Integration Mar 19, 2026
Map("typeInfo" -> s"${literalCase.name}(${literalCase.getNumber})"))
}
builder.build()
ProtoTypeOps
Copy link
Copy Markdown
Contributor Author

@davidm-db davidm-db Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there doesn't seem to be a way to nicely/cleanly reduce the diff here

@davidm-db davidm-db force-pushed the davidm-db/types_framework_1c branch 2 times, most recently from 8da3e67 to db8b270 Compare March 30, 2026 12:16
@davidm-db davidm-db marked this pull request as ready for review March 30, 2026 12:16
@davidm-db davidm-db force-pushed the davidm-db/types_framework_1c branch from db8b270 to 38aef26 Compare March 30, 2026 12:24
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Prior state and problem: Adding a new data type required manually editing 13+ client-side infrastructure files with diverse dispatch patterns.

Design approach: Phase 1c extends the Types Framework with 4 new optional traits (ClientTypeOps, CatalystTypeOps, ProtoTypeOps, ConnectArrowTypeOps) that centralize client-facing operations per type. Each integration file wraps its original method body in a *Default helper and adds a framework dispatch. TimeType serves as the reference implementation.

Key design decisions:

  1. Feature flag gating: server-side ops check typesFrameworkEnabled; Connect client Arrow ops skip the flag (client must handle whatever server sends).
  2. Visibility strategy: CatalystTypeOps uses type-safe ArrowFieldWriter (widened to private[sql]), while ConnectArrowTypeOps uses Any for all parameters/returns due to private[arrow] visibility of concrete types.
  3. Reverse lookup registration: forward dispatch goes through a single ops instance, but reverse lookups (Arrow→DataType, proto→DataType, value→proto) each need separate factory methods with per-type matching.

Implementation sketch: 13 integration points consistently use XxxTypeOps(dt).map(_.method).getOrElse(default(...)). Per-type ops classes: TimeTypeApiOps (sql/api), TimeTypeOps (catalyst), TimeTypeConnectOps (connect).

*/
object ProtoTypeOps {

def apply(dt: DataType): Option[ProtoTypeOps] = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a second framework-managed type requires 5 separate match-clause additions in this file (apply, toLiteralProtoForValue, toCatalystType, getScalaConverterForKind, getProtoDataTypeFromLiteral). Three of those dispatch by proto enum cases and can be consolidated.

Concrete suggestion — extract a shared opsForKindCase lookup and add a toCatalystTypeFromProto method to the trait:

// Add to ProtoTypeOps trait:
def toCatalystTypeFromProto(t: proto.DataType): DataType

// In companion object — single KindCase → ops mapping:
private def opsForKindCase(
    kindCase: proto.DataType.KindCase): Option[ProtoTypeOps] = kindCase match {
  case proto.DataType.KindCase.TIME => Some(new TimeTypeConnectOps(TimeType()))
  // Add new framework types here — single registration for all KindCase lookups
  case _ => None
}

// Then toCatalystType, getScalaConverterForKind, getProtoDataTypeFromLiteral all delegate:
def toCatalystType(t: proto.DataType): Option[DataType] = {
  if (!SqlApiConf.get.typesFrameworkEnabled) return None
  opsForKindCase(t.getKindCase).map(_.toCatalystTypeFromProto(t))
}

def getScalaConverterForKind(
    kindCase: proto.DataType.KindCase): Option[proto.Expression.Literal => Any] = {
  if (!SqlApiConf.get.typesFrameworkEnabled) return None
  opsForKindCase(kindCase).map(_.getScalaConverter)
}

def getProtoDataTypeFromLiteral(literal: proto.Expression.Literal): Option[proto.DataType] = {
  if (!SqlApiConf.get.typesFrameworkEnabled) return None
  // LiteralTypeCase maps 1:1 to KindCase for framework types
  opsForKindCase(literalCaseToKindCase(literal.getLiteralTypeCase))
    .map(_.getProtoDataTypeFromLiteral(literal))
}

This reduces per-type registration in this file from 5 match points to 3 (apply, opsForKindCase, toLiteralProtoForValue).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggestion. Extracted a shared opsForKindCase lookup and added toCatalystTypeFromProto to the trait. toCatalystType, getScalaConverterForKind, and getProtoDataTypeFromLiteral all delegate through the single opsForKindCase registration point now. Per-type registration in this file reduced from 5 match blocks to 3 (apply, opsForKindCase, toLiteralProtoForValue). Added literalCaseToKindCase helper for the LiteralTypeCase-to-KindCase mapping.

// the arrow package cast to the expected types.

/** Creates an Arrow serializer for writing values to a vector. Returns a Serializer. */
def createArrowSerializer(vector: Any): Any
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three methods use Any for both parameters and return types — type mismatches between the ops implementation and the call site (e.g., casting to the wrong vector type) are only caught at runtime as ClassCastException. CatalystTypeOps avoided this by widening ArrowFieldWriter/TimeWriter to private[sql]. Could the Connect Arrow types (Serializer, Deserializer, ArrowVectorReader) similarly be widened to private[connect]?

Copy link
Copy Markdown
Contributor Author

@davidm-db davidm-db Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Widened FieldSerializer to private[connect], ArrowVectorReader and TimeVectorReader to private[connect]. All three return types are now proper: ArrowSerializer.Serializer, ArrowDeserializers.Deserializer[Any], ArrowVectorReader. Call sites no longer need .asInstanceOf casts on the return value.

For parameters, 1 of 3 is also proper (createArrowVectorReader takes FieldVector). The other 2 (createArrowSerializer, createArrowDeserializer) take AnyRef because the existing caller APIs (serializerFor(encoder, v: AnyRef), deserializerFor(encoder, data: AnyRef)) use AnyRef — these signatures predate our work and use pattern matching to determine the concrete vector type. Using AnyRef avoids an unnecessary intermediate cast.

Net result: 6 Any positions reduced to 2 AnyRef parameters. The remaining 2 are constrained by the existing caller APIs, not by visibility.

binaryFormatter: BinaryFormatter): String = a match {
case (null, _) => if (nested) "null" else "NULL"
case (value, dt) =>
ClientTypeOps(dt).map(_.formatExternal(value, nested)).getOrElse {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns with this dispatch:

  1. Bypasses session formatters. formatExternal uses the ops instance's own FractionTimeFormatter, ignoring the formatters: TimeFormatters parameter. Currently both use the same new FractionTimeFormatter() so output matches, but if session-level time formatting becomes configurable the framework path would silently ignore it.

  2. Per-value allocation. ClientTypeOps(dt)TypeApiOps(dt) creates a new TimeTypeApiOps (including a new @transient lazy val timeFormatter) on every call. For a result set with N TimeType values, that's N instances + N formatters vs. the original code which reuses a single formatters.time for all values.

Consider passing the session formatter through the ops interface, or caching the ops instance per DataType.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. TimeType currently has two parallel implementations — the original hardcoded branches and the framework ops. The framework implementation is how a new type would be built from scratch (formatter owned by the ops class, no dependency on external formatter pipelines). This should be the correct design for new types going forward.

    • The concern about bypassing session formatters doesn't apply here — FractionTimeFormatter is fully stateless (no session configuration, no timezone, no legacy parser policy). Unlike TimestampFormatter (which takes sessionLocalTimeZone) or DateFormatter (which checks legacyTimeParserPolicy), the time formatter has zero session state. Both paths — the original formatters.time.format(lt) and the framework's formatExternal — create the same stateless FractionTimeFormatter and produce identical output.
    • For future types that DO need session-configurable formatting, the ops can read session state inside formatExternal at call time via SqlApiConf.get, matching how HiveResult.getTimeFormatters reads SQLConf.get.sessionLocalTimeZone. The framework design handles this without API changes.
  2. Re allocation: tracked for Phase 1d — singleton caching of ops instances across all factory companions uniform. This was already noted down during work on other phases and I'll fix it all together in the upcoming phase.

def getCommonJDBCType(dt: DataType): Option[JdbcType] = {
// Uses .orElse (not .getOrElse) because this method returns Option[JdbcType].
def getCommonJDBCType(dt: DataType): Option[JdbcType] =
ClientTypeOps(dt).map(ops => JdbcType(ops.jdbcTypeName, ops.getJdbcType))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the framework is enabled, getCommonJDBCType(TimeType) now returns Some(JdbcType("TIME", java.sql.Types.TIME)). Before this PR, the default path had no TimeType case and returned None. This is a new capability (not a regression), but the PR description's "Does this PR introduce any user-facing change?" says No. Since the flag defaults to true in tests, this changes test-environment behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed JDBC entirely from this PR. getCommonJDBCType reverted to original code (no framework dispatch), JDBC methods removed from ClientTypeOps. The framework should be a transparent refactoring with no behavior change. JDBC requires types from sql/core that ClientTypeOps (in sql/api) can't reference, and partial support (DDL without getter/setter) is worse than no support. A separate JdbcTypeOps trait in sql/core is planned for when full JDBC support is ready.

@dejankrak-db
Copy link
Copy Markdown
Contributor

Surface Areas Detected

  • Types Framework — 4 new traits (ClientTypeOps, CatalystTypeOps, ProtoTypeOps, ConnectArrowTypeOps) + 1 implementation (TimeTypeConnectOps)
  • Spark Connect — Proto converters, Arrow serde (7 files)
  • sql/core — HiveResult, JdbcUtils, EvaluatePython, SparkExecuteStatementOperation (4 files)
  • sql/catalyst — SerializerBuildHelper, DeserializerBuildHelper, ArrowWriter (3 files)
  • sql/api — ArrowUtils, TimeTypeApiOps (2 files)
  • DBR Analyzer / Parser / Reyden (0 files)

Findings

Finding 1: Arrow NANOSECOND unit maps to MICROS_PRECISION — semantic mismatch

  • Severity: High
  • Category: Behavioral Correctness
  • File: ClientTypeOps.scala:196 (fromArrowType)
  • Issue: fromArrowType matches ArrowType.Time with TimeUnit.NANOSECOND and 64-bit width but returns TimeType(TimeType.MICROS_PRECISION). Meanwhile, toArrowType in TimeTypeApiOps always emits ArrowType.Time(TimeUnit.NANOSECOND, 64) regardless of the
    TimeType's actual precision. A roundtrip TimeType(NANOS_PRECISION) -> Arrow -> TimeType silently loses precision metadata.
  • Impact: Precision metadata is silently discarded on the Arrow deserialization path. If a TimeType with non-default precision is sent through Arrow (e.g., via PySpark or Connect), it will come back as MICROS_PRECISION.
  • Suggested fix: Either encode precision in Arrow field metadata, derive precision from the Arrow time unit, or document as a known lossy conversion with a TODO. toArrowType should arguably vary its TimeUnit based on t.precision.

Finding 2: Per-row object allocation in HiveResult hot path

  • Severity: High
  • Category: Performance
  • File: HiveResult.scala:113-123
  • Issue: ClientTypeOps(dt) is called on every (value, dt) pair in the result set, creating a new TimeTypeApiOps instance per call. Additionally, it's called for ALL non-null values of ALL types (not just TimeType), returning None for standard types —
    adding overhead to every row of every query.
  • Impact: Significant performance regression on large result sets, especially for non-TimeType columns.
  • Suggested fix: Pre-compute the formatting function per column at the schema level (once per query), not per row. At minimum, restructure the match so the catch-all only triggers after all built-in types fail.

Finding 3: toLiteralProto hardcodes DEFAULT_PRECISION instead of using instance precision

  • Severity: High
  • Category: Behavioral Correctness
  • File: TimeTypeConnectOps.scala:763
  • Issue: toLiteralProto (the "generic, no DataType context" version) sets .setPrecision(TimeType.DEFAULT_PRECISION) instead of t.precision. The instance is constructed with a specific TimeType(precision), but this method ignores it. Meanwhile,
    toLiteralProtoWithType correctly uses dt.asInstanceOf[TimeType].precision.
  • Impact: Literal precision metadata is lost when going through the "untyped" literal conversion path.
  • Suggested fix: Use t.precision instead of TimeType.DEFAULT_PRECISION.

Finding 4: ConnectArrowTypeOps uses Any throughout — type safety completely lost

  • Severity: Medium
  • Category: API and Interface Design
  • File: ConnectArrowTypeOps.scala:50-56
  • Issue: All three methods (createArrowSerializer, createArrowDeserializer, createArrowVectorReader) accept Any and return Any. Implementation does unchecked asInstanceOf casts — a wrong vector type produces a ClassCastException at runtime.
  • Impact: No compile-time protection for implementers or callers.
  • Suggested fix: Widen ArrowSerializer.Serializer, ArrowDeserializers.Deserializer, and ArrowVectorReader from private[arrow] to private[connect] to enable proper type signatures.

Finding 5: Null safety gap in HiveResult framework dispatch

  • Severity: Medium
  • Category: Behavioral Correctness
  • File: HiveResult.scala:113-129
  • Issue: The outer match handles (null, _) first, then dispatches to the framework for all other values. If ClientTypeOps(dt) matches, formatExternal calls value.asInstanceOf[LocalTime]. The null guard is solely in the outer match — if the dispatch
    order is ever refactored, a NPE will occur.
  • Impact: Fragile null safety that depends on match ordering.
  • Suggested fix: Add a null guard in the ClientTypeOps(dt).map(...) call or in formatExternal.

Finding 6: Redundant TimeType branches in *Default methods — no verification of equivalence

  • Severity: Medium
  • Category: Rollout Safety
  • Files: EvaluatePython.scala (_: TimeType in needConversionInPythonDefault), LiteralValueProtoConverter.scala (TIME case in getProtoDataTypeDefault)
  • Issue: When framework is ON, TimeType hits the framework path and default branches are dead code. When OFF, default branches handle it. This is intentional for rollout safety, but there is no test verifying both paths produce identical results.
  • Impact: Silent behavioral divergence between framework-ON and framework-OFF would be undetectable until production rollout.
  • Suggested fix: Add a parameterized test that toggles spark.sql.types.framework.enabled and asserts identical results.

Finding 7: SerializerBuildHelper / DeserializerBuildHelper dispatch by enc.dataType ignoring encoder type

  • Severity: Medium
  • Category: Behavioral Correctness
  • Files: SerializerBuildHelper.scala:361, DeserializerBuildHelper.scala:336
  • Issue: Framework dispatch uses CatalystTypeOps(enc.dataType) before checking the encoder. The original code dispatches on the encoder — the same data type could have different encoders (e.g., AgnosticExpressionPathEncoder is handled first in
    createSerializerDefault). The framework could intercept an encoder that should have been handled differently.
  • Impact: If a TimeType encoder appears as an AgnosticExpressionPathEncoder in the future, the framework path would incorrectly override it.
  • Suggested fix: Move framework dispatch after the AgnosticExpressionPathEncoder and isNativeEncoder checks, or have the framework dispatch also check encoder type.

Finding 8: Undocumented JDBC behavior change for TimeType

  • Severity: Medium
  • Category: Rollout Safety
  • File: JdbcUtils.scala:1411-1413
  • Issue: getCommonJDBCType(TimeType) previously returned None. With the framework enabled, it now returns Some(JdbcType("TIME", java.sql.Types.TIME)). The PR description says "No user-facing change."
  • Impact: Functional behavior change for JDBC operations with TimeType columns.
  • Suggested fix: Acknowledge this as a new capability in the PR description. Verify JDBC TIME handling across common drivers.

Finding 9: Repeated new TimeTypeConnectOps(TimeType()) allocations across 5+ sites

  • Severity: Medium
  • Category: Performance
  • File: ProtoTypeOps.scala (5 factory methods), ConnectArrowTypeOps.scala (2 factory methods)
  • Issue: Each factory method creates a new TimeTypeConnectOps instance per invocation. These are stateless and could be cached.
  • Impact: Unnecessary GC pressure during result deserialization.
  • Suggested fix: Cache a singleton TimeTypeConnectOps instance and reuse across all factory methods.

Finding 10: No new tests despite 966 lines of new code

  • Severity: Medium
  • Category: Test Quality
  • Issue: 5 new files and 15 modified files but zero new tests. The framework path includes new dispatch logic, new asInstanceOf casts, and new Arrow/proto conversion code. There are no tests verifying framework-ON/OFF equivalence.
  • Impact: Subtle bugs (wrong precision, wrong cast, wrong null handling) would go undetected until production rollout.
  • Suggested fix: Add: (1) roundtrip tests for each trait, (2) framework flag toggle tests, (3) fromArrowType reverse lookup test.

Finding 11: ClientTypeOps.apply does not check feature flag — inconsistent gating

  • Severity: Medium
  • Category: Rollout Safety
  • File: ClientTypeOps.scala:177
  • Issue: ClientTypeOps.apply(dt) delegates to TypeApiOps(dt) which presumably checks the flag. But ClientTypeOps.fromArrowType(at) explicitly checks the flag. This inconsistency makes the gating contract implicit and fragile.
  • Suggested fix: Add an explicit flag check in ClientTypeOps.apply or document that TypeApiOps is the gating layer.

Finding 12: createFieldWriterDefault unnecessarily private[sql]

  • Severity: Low
  • Category: Code Style
  • File: ArrowWriter.scala:586
  • Issue: createFieldWriterDefault is private[sql] but only called from within ArrowWriter. Should be private.

Finding 13: ProtoTypeOps companion has 5 separate registration points per type

  • Severity: Low
  • Category: API Design
  • File: ProtoTypeOps.scala
  • Issue: Adding a second framework type requires modifying 5 separate match clauses. Three dispatch by proto enum case and could be consolidated.
  • Suggested fix: Extract a shared opsForKindCase lookup.

Summary

┌──────────┬───────┐
│ Severity │ Count │
├──────────┼───────┤
│ Critical │ 0 │
├──────────┼───────┤
│ High │ 3 │
├──────────┼───────┤
│ Medium │ 7 │
├──────────┼───────┤
│ Low │ 3 │
└──────────┴───────┘

Total findings: 13

Recommended Actions

  1. Fix Arrow precision mismatch — fromArrowType should not hardcode MICROS_PRECISION (Finding 1)
  2. Fix toLiteralProto precision — use t.precision not DEFAULT_PRECISION (Finding 3)
  3. Move HiveResult dispatch out of per-row loop — pre-compute per column (Finding 2)
  4. Add framework flag toggle tests — verify ON/OFF equivalence for TimeType (Findings 6, 10)
  5. Fix serializer dispatch order — framework should not intercept AgnosticExpressionPathEncoder (Finding 7)

@davidm-db
Copy link
Copy Markdown
Contributor Author

@dejankrak-db All findings should be either addressed or replied to now.

Finding 1 (Arrow precision mismatch): Pre-existing behavior — the original ArrowUtils.fromArrowType also returns TimeType(MICROS_PRECISION) for nanosecond Arrow time vectors. Not introduced by this PR. It is tracked among the issues with TIME we discovered during the work on the framework and will be addressed separately.

Finding 2 (Per-row allocation in HiveResult): Valid concern. Singleton caching of ops instances across all factory companions is tracked for Phase 1d (Validation & Performance).

Finding 3 (toLiteralProto DEFAULT_PRECISION): Not a bug. The "generic" literal path receives a LocalTime value with no DataType context. LocalTime doesn't carry precision metadata — precision is a property of the column (DataType), not the value. DEFAULT_PRECISION is the correct choice here, same as the original code. The typed overload (toLiteralProtoWithType) correctly uses the DataType's precision.

Finding 4 (ConnectArrowTypeOps Any types): Already fixed in a previous iteration — widened FieldSerializer, ArrowVectorReader, and TimeVectorReader to private[connect]. The trait now uses proper return types and all asInstanceOf casts are removed from call sites.

Finding 5 (Null safety in HiveResult): The null case is the first match arm and always runs before framework dispatch. This ordering is structural (same as all other type cases in the match) — not something that can be accidentally refactored away without changing the entire method signature.

Finding 6 (Redundant TimeType in defaults): Intentional — project guideline: "Don't remove existing TimeType code. Its explicit case branches remain as safety net and fallback when the feature flag is off." Equivalence testing is tracked for Phase 1d/Phase 2.

Finding 7 (Serializer dispatch order vs. encoder type): AgnosticExpressionPathEncoder is a migration shim (its own Scaladoc says so), and isNativeEncoder handles primitives (Boolean, Int, Long, etc.). Framework types use dedicated leaf encoders (e.g., LocalTimeEncoder), never migration shims or native primitives. Added a comment at both dispatch sites documenting this.

Finding 8 (JDBC behavior change): Removed JDBC from this PR entirely — getCommonJDBCType reverted to original, JDBC methods removed from ClientTypeOps. A separate JdbcTypeOps trait in sql/core is planned for when full JDBC support (DDL + getter + setter) is ready.

Finding 9 (Repeated allocations): Tracked for Phase 1d — singleton caching across all factory companions uniformly.

Finding 10 (No new tests): Tracked for Phase 2 (Testing Framework). The feature flag is enabled by default in tests, so the entire existing test suite validates the framework path.

Finding 11 (ClientTypeOps.apply flag check): By design — ClientTypeOps(dt) delegates to TypeApiOps(dt) which checks the flag. The inline comment explains: "Delegates to TypeApiOps and narrows: a type must implement TypeApiOps AND mix in ClientTypeOps to be found here." fromArrowType checks explicitly because it's a reverse lookup that doesn't go through TypeApiOps.

Finding 12 (createFieldWriterDefault visibility): Fixed — changed from private[sql] to private.

Finding 13 (ProtoTypeOps 5 registration points): Already fixed while addressing one of the Wenchen's comments — consolidated to 3 via a shared opsForKindCase lookup. toCatalystType, getScalaConverterForKind, and getProtoDataTypeFromLiteral all delegate through a single KindCase registration point.

@davidm-db davidm-db force-pushed the davidm-db/types_framework_1c branch from 0f0049d to 7c65800 Compare April 3, 2026 19:42
@cloud-fan
Copy link
Copy Markdown
Contributor

if the final design is to have one interface per module, then shall we merge ProtoTypeOps and ConnectArrowTypeOps as they are both in connect/common?

@davidm-db davidm-db force-pushed the davidm-db/types_framework_1c branch from 7c65800 to bc61b04 Compare April 9, 2026 09:45
@davidm-db
Copy link
Copy Markdown
Contributor Author

davidm-db commented Apr 9, 2026

if the final design is to have one interface per module, then shall we merge ProtoTypeOps and ConnectArrowTypeOps as they are both in connect/common?

@cloud-fan merged two ops classes into a single one. all other refactors we talked about, like optional functionalities, etc. will be investigated and done as a follow-up, since they don't really fit into the topic of this pull request.

* TimeTypeApiOps for a reference implementation
* @since 4.2.0
*/
trait ClientTypeOps { self: TypeApiOps =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeApiOps and ClientTypeOps are both for client side? shall we merge them into one? I'm ok to leave additional work for followup PRs, but not commit a draft version and rewrite it in followup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. It's also true for CatalystTypeOps and TypeOps who are both on the catalyst side. However, in both cases, one is optional and the other is mandatory. I thought of merging it like this to enable functionality, and follow-up with a refactor, where I would introduce the concept of optional functions (returning Options) in the ops classes, since it's a framework design, rather than enabling functionality.
If you think however that we should do it here, I can, I was just explaining my reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants