Skip to content

perf(models): validate only the matched variant for discriminated unions (#1649)#1695

Open
Echolonius wants to merge 1 commit into
anthropics:mainfrom
Echolonius:perf/construct-type-discriminated-union-fast-path
Open

perf(models): validate only the matched variant for discriminated unions (#1649)#1695
Echolonius wants to merge 1 commit into
anthropics:mainfrom
Echolonius:perf/construct-type-discriminated-union-fast-path

Conversation

@Echolonius

Copy link
Copy Markdown

Summary

construct_type validated the whole RawMessageStreamEvent union on every streaming event, even though each wire event already names its variant via the type discriminator. Validating a union forces pydantic to consider every member; resolving the variant up-front and validating just that one returns an identical object for roughly half the CPU.

This addresses the union-decode leg of #1649 — the largest cost in that issue's profile (construct_typevalidate_python, ~45s of the ~100s). It is complementary to #1663, which covers the build_events / accumulate_event legs in _messages.py and does not touch the union decode in _models.py.

Change

In the is_union branch of construct_type, when the union is discriminated and the value carries a known discriminator string, validate just the matched variant before falling back to the whole-union validate:

variant_type = discriminator.mapping.get(variant_value)
if variant_type is not None:
    try:
        return validate_type(type_=variant_type, value=value)   # validate one variant, not the union
    except Exception:
        pass
# ... existing whole-union validate + fallbacks, unchanged

The discriminator metadata is already cached in DISCRIMINATOR_CACHE, so variant resolution is a dict lookup after warm-up. The existing post-failure discriminator block is reused (the variant_type is now resolved once at the top), so the diff is small.

Why it's safe (behavior preserved)

  • Valid discriminated data → validates the same variant the whole union would have selected, yielding an identical object (verified: new == old == construct_type for every event in the benchmark).
  • Invalid data → variant validation raises, we pass, and fall through to the exact existing path (whole-union validate → unvalidated .construct() of the matched variant). The pre-existing test_discriminated_unions_invalid_data* tests pin this and still pass.
  • Unknown variant / non-discriminated unionvariant_type stays None, fast path is skipped, whole-union path runs unchanged.

Benchmark

Realistic stream (4000 content_block_delta events + start/stop), pydantic 2.12.5 / CPython 3.9, best of 7:

path µs/event total
whole-union validate (before) 16.1 64.3 ms
matched-variant validate (after) 8.8 35.3 ms
speedup ~1.8× ~45% less CPU

Every streaming consumer pays this leg (iteration and get_final_message()-only), and the same fast path benefits all discriminated-union decoding across the SDK — not just streaming.

Tests

tests/test_models.py — added 4 tests:

  • fast path validates only the matched variant (never the whole union)
  • fast-path result is identical to the whole-union result for clean data
  • invalid data falls back to the existing .construct() path
  • non-discriminated unions are unaffected

All existing test_models.py + streaming tests pass (62 model tests, 280 across models/streaming/client). No new pyright/ruff findings.

construct_type validated the whole RawMessageStreamEvent union on every
streaming event, even though the wire data already names its variant via
the `type` discriminator. Validating a union forces pydantic to consider
every member; resolving the variant first and validating just that one
returns an identical object for roughly half the CPU.

The variant is resolved from the (already-cached) discriminator metadata
before the full-union validate. On a 4k-delta stream this is ~1.8x faster
on the decode leg (~16.1 -> 8.8 us/event, ~45% less CPU, pydantic 2.12 /
py3.9). Invalid or non-discriminated data falls through to the existing
whole-union path unchanged, so behavior is identical in every other case.

Addresses the union-decode leg of anthropics#1649 (complementary to anthropics#1663, which
covers the build_events / accumulate_event legs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant