Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 747: Fix rules related to UnionType (T1 | T2). Contrast TypeExpr with TypeAlias. Apply other feedback. #3856

Merged
merged 11 commits into from
Jul 9, 2024

Conversation

davidfstr
Copy link
Contributor

@davidfstr davidfstr commented Jul 4, 2024

  • Change is either:
    • To a Draft PEP
    • To an Accepted or Final PEP, with Steering Council approval
    • To fix an editorial issue (markup, typo, link, header, etc)
  • PR title prefixed with PEP number (e.g. PEP 123: Summary of changes)

In particular:

  • Implicit Annotation Expression Values: Delete section
  • Alter unparameterized TypeExpr to mean TypeExpr[Any]

📚 Documentation preview 📚: https://pep-previews--3856.org.readthedocs.build/

peps/pep-0747.rst Outdated Show resolved Hide resolved
Relationship with UnionType
'''''''''''''''''''''''''''

``TypeExpr[U]`` is a subtype of ``UnionType`` iff ``U`` is a non-empty union type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be specified? It feels like asking type checkers to understand details of runtime implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From §"Backward Compatibility":

As a value expression, X | Y previously had type UnionType (via :pep:604)
but this PEP gives it the more-precise static type TypeExpr[X | Y]
(a subtype of UnionType) while continuing to return a UnionType instance at runtime.
Preserving compability with UnionType is important because UnionType
supports isinstance checks, unlike TypeExpr, and existing code relies
on being able to perform those checks.

Rephrasing:

  • type.__or__ (and other methods) now have return type TypeExpr[X | Y] rather than UnionType
  • Static type checkers need to treat TypeExpr[X | Y] as assignable to UnionType so that existing methods like isinstance which expect a UnionType continue to pass type checking when given a X | Y expression.
    • For example isinstance('words', int | str) needs to still pass type checking even though int | str is now a TypeExpr[int | str] and isinstance expects a UnionType as its second argument.


- ``TypeExpr[X | Y | ...]`` is a subtype of ``UnionType``.
- ``TypeExpr[Union[X, Y, ...]]`` is a subtype of ``UnionType``.
- ``TypeExpr[Optional[X]]`` is a subtype of ``UnionType``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't at runtime (currently)

Copy link
Contributor Author

@davidfstr davidfstr Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed technically I don't think it is, but I don't think the lack of a runtime subtype relationship is observable. §"Interactions with isinstance() and issubclass()" says:

The TypeExpr special form cannot be used as any argument to
issubclass:

So I'd expect the following behavior:

issubclass(TypeExpr[int | str], UnionType)
TypeError: issubclass() arg 1 must be a class

Edit: And indeed I see that behavior with the current implementation of TypeExpr in typing_extensions.

Are there other ways you can think of that a lack of a runtime subtype relationship could be observable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from types import UnionType
from typing_extensions import Optional, TypeExpr

def f(x: UnionType):
    assert isinstance(x, UnionType)

def g(x: TypeExpr[Optional[int]]):
    f(x)

g(Optional[int])  # boom

I'm not sure there is much reason for TypeExpr to ever be a subtype of UnionType; I don't think it will significantly help users of TypeExpr.

Copy link
Contributor Author

@davidfstr davidfstr Jul 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever example. OK it seems that if TypeExpr is to (sometimes) be a subtype of UnionType then there are ways to observe it runtime.

I'm not sure there is much reason for TypeExpr to ever be a subtype of UnionType; I don't think it will significantly help users of TypeExpr.

I also don't think it will help a ton, but I think it's necessary for backward compatibility for existing functions that accept UnionType so long as there's no other way to spell "a TypeExpr that is a non-empty union type". Currently §["Rejected Ideas" > "Support pattern matching on type expressions"] does not provide a spelling that can replace existing usage of UnionType.

Do you have an alternative suggestion in mind that both:

  1. gives a TypeExpr[int | str] result for the value expression int | str and
  2. continues to allow isinstance('words', int | str) to pass a type checker?

I believe it should be possible to make TypeExpr[U] be conditionally considered a subtype of UnionType at runtime (via an isinstance check) by overriding __instancecheck__ on the metaclass of TypeExpr UnionType.


Aside: The current implementation of int | str gives a UnionType, but neither Union[int, str] nor Optional[int] give UnionTypes. If we wanted to more-strictly preserve the existing behavior, I'd be open to narrowing the rules in this section to only make TypeExpr[X | Y | ...] a subtype of UnionType:

  • TypeExpr[X | Y | ...] is a subtype of UnionType.
  • TypeExpr[Union[X, Y, ...]] is not a subtype of UnionType.
  • TypeExpr[Optional[X]] is not a subtype of UnionType.
  • TypeExpr[Never] is not a subtype of UnionType.
  • TypeExpr[NoReturn] is not a subtype of UnionType.

Edit: However I will note that (X | None) == Union[X, None] == Optional[X] at runtime so it could be confusing to users if Union and Optional couldn't be used in the same place as an X | Y expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backwards compatibility problem is an issue of type checker inference that we don't have to specify exactly. It's also a very limited problem, mostly applying to isinstance() which is necessarily special-cased by type checkers anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code can be written by users today:

def accept_union(u: UnionType):
    pass

accept_union(int | str)

Are you saying we shouldn’t worry about breaking this code so long as we avoid breaking code related to isinstance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way to avoid breaking that code can be up to type checkers. We don't need to prescribe it exactly; different type checkers can use different approaches, and adapt it to changes in how the runtime works. For example, type checkers could store something in their internal representation of a TypeExpr type to indicate the runtime construct that was used to create it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type checkers could store something in their internal representation of a TypeExpr type to indicate the runtime construct that was used to create it.

Yes, this description is consistent with the implementation approach I had in mypy: A single bit like is_uniontype.

But the overall specification rule being implemented is still:

TypeExpr[X | Y | ...] is a subtype of UnionType

I'll update the diff to include only this rule, and not the extraneous ones for Union and Optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With only the remaining rule, the code you mentioned before fails (correctly) at type checking time and at runtime:

from types import UnionType
from typing_extensions import Optional, TypeExpr

def f(x: UnionType):
    assert isinstance(x, UnionType)  # AssertionError

def g(x: TypeExpr[Optional[int]]):
    f(x)  # ERROR: TypeExpr[Optional[int]] is not a UnionType

g(Optional[int])

And similar code involving TypeExpr[X | Y | ...] passes (correctly) both at type checking time and at runtime:

from types import UnionType
from typing_extensions import Optional, TypeExpr

def f(x: UnionType):
    assert isinstance(x, UnionType)  # OK (runtime)

def g(x: TypeExpr[int | None]):
    f(x)  # OK (type checking time)

g(int | None)

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel this may specify too many details, but let's leave that for the PEP discussion.

@JelleZijlstra JelleZijlstra merged commit 1ad2288 into python:main Jul 9, 2024
6 checks passed
- As a value expression, ``x | y`` has type equal to the return type of ``type(x).__or__``
if ``type(x)`` overrides the ``__or__`` method.

- When ``x`` has type ``builtins.type``, ``types.GenericAlias``, or the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to interpret this statement. The phrase "has type" isn't clear. Are you talking about type equivalence? Assignability?

Also, are these static types or runtime types? I presume it's static types, but if that's the case, then I don't know why GenericAlias is mentioned because that's not a type a static type checker would ever evaluate. It's a runtime implementation detail.

What if the static type of x is a union, and some of the subtypes have a custom __or__ override and some do not? Presumably, this formulation assumes that an expansion of the types of x and y has already been performed, and x and y are not union types?

What if the __or__ method is present, but evaluating it generates a type error (e.g. because y's type is incompatible with the signature)?

We can try to hammer out all of these details, but this is getting really complex. One option is to say that unions never evaluate to TypeExpr unless you use a TypeExpr constructor (i.e. TypeExpr(x | y)). This would also avoid the issue with UnionType.

and ``y`` has type ``TypeExpr[t2]`` (or ``type[t2]``).
- As a value expression, ``x | y`` has type ``int``
if ``x`` has type ``int`` and ``y`` has type ``int``
- As a value expression, ``x | y`` has type ``UnionType``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule says "in all other situations". What other situations are not covered in the above rules? I think they cover everything, right? Can you give an example of types x and y where UnionType would be evaluated?

To simplify static type checking, a ``Literal[...]`` value is *not*
considered assignable to a ``TypeExpr`` variable even if all of its members
spell valid types:
A value of ``Literal[...]`` type is *not* considered assignable to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is phrased, it still sounds like you're talking about the expression Literal[...] (where ... is some legal literal value like 1 or "hi"). I think what you mean here is "a value expression whose evaluated type is a literal string expression". If I'm interpreting this correctly, then I agree with the rule, but I think it needs to be reworded because that's not what it currently says.

considered assignable to a ``TypeExpr`` variable even if all of its members
spell valid types:
A value of ``Literal[...]`` type is *not* considered assignable to
a ``TypeExpr`` variable even if all of its members spell valid types because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "TypeExpr variable"? Is it a variable whose type is declared to be TypeExpr[T] (or a union that includes such a subtype)? If so, what does it mean for a variable to have "members"?

Relationship with UnionType
'''''''''''''''''''''''''''

``TypeExpr[U]`` is a subtype of ``UnionType`` iff ``U`` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I don't think this is a good solution. There are many ways that union types can be formed in static analysis. For example, they arise from "joins" in code flow. This doesn't necessarily mean that at runtime the value is implemented with UnionType.

Consider the following:

x: TypeExpr[int | str]
if random() > 0.5:
    x = int
    reveal_type(x) # type[int]
else:
    x = str
    reveal_type(x) # type[str]

reveal_type(x) # TypeExpr[int | str]
print(x) # Will print either "<class 'int'>" or "<class 'str'>", never "UnionType" or "int | str"

y: UnionType = x # This would be problematic!

As another example, Literal[1, 2] and Literal[1] | Literal[2] are equivalent types. They are completely interchangeable from the perspective of a static type checker, but they have very different runtime representations. One is implemented as an instance of typing._LiteralGenericAlias, and the other is a typing._UnionGenericAlias.

@@ -711,12 +735,38 @@ assigned to variables and manipulated like any other data in a program:
``TypeExpr[]`` is how you spell the type of a variable containing a
type annotation object describing a type.

``TypeExpr[]`` is similar to ``type[]``, but ``type[]`` can only used to
``TypeExpr[]`` is similar to ``type[]``, but ``type[]`` can only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point you're trying to make here, but it's a little misleading because type (when used in a type expression) works with many of the example in this list including type[list[int]] or type[int | None].

spell simple **class objects** like ``int``, ``str``, ``list``, or ``MyClass``.
``TypeExpr[]`` by contrast can additionally spell more complex types,
including those with brackets (like ``list[int]``) or pipes (like ``int | None``),
and including special types like ``Any``, ``LiteralString``, or ``Never``.

A ``TypeExpr`` variable looks similar to a ``TypeAlias`` definition, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "TypeExpr variable"? I think what you mean is that a variable (or parameter) can be statically evaluated to have a type of TypeExpr[T]?

By "TypeAlias definition", are you talking about a statement of the form <name>: TypeAlias = <type expression>, as define din PEP 613? I'm not sure how this is related to TypeExpr.

Perhaps you're talking about PEP 484 type aliases that have the syntactic form <name> = <expression> and numerous (undocumented) semantic rules and heuristics that distinguish it from a regular variable assignment? If that's the case, then I agree there's potential overlap with the TypeExpr concept. In particular, I was thinking that we could leverage the definitions in this PEP to (at last!) formalize the rules for PEP 484 type aliases. I'm now less sure of this given some of the other limitations we've needed to add to this PEP, such as the requirement that certain ambiguous forms must use an explicit TypeExpr constructor call.

spell simple **class objects** like ``int``, ``str``, ``list``, or ``MyClass``.
``TypeExpr[]`` by contrast can additionally spell more complex types,
including those with brackets (like ``list[int]``) or pipes (like ``int | None``),
and including special types like ``Any``, ``LiteralString``, or ``Never``.

A ``TypeExpr`` variable looks similar to a ``TypeAlias`` definition, but
can only be used where a dynamic value is expected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what "only be used where a dynamic value is expected"? What is a "dynamic value" in this context?

A ``TypeExpr`` variable looks similar to a ``TypeAlias`` definition, but
can only be used where a dynamic value is expected.
``TypeAlias`` (and the ``type`` statement) by contrast define a name that can
be used where a fixed type is expected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "fixed type" isn't defined anywhere. I think what you mean is that a type alias can be used in a type expression whereas variables cannot?


::

maybe_float: TypeExpr = float | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I now understand what you were trying to say above. This is all a (confusing) way to reiterate that a variable cannot be used in a type expression. That rule is already spelled out clearly in the "Type Annotations" section of the spec, so I don't think it needs to be repeated in this PEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants