Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 749: Add section on metaclasses #3847

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JelleZijlstra
Copy link
Member

@JelleZijlstra JelleZijlstra commented Jun 19, 2024

Deriving from https://discuss.python.org/t/pep-749-implementing-pep-649/54974/28 and offline discussion with Alex.


📚 Documentation preview 📚: https://pep-previews--3847.org.readthedocs.build/

peps/pep-0749.rst Outdated Show resolved Hide resolved
Co-authored-by: Carl Meyer <[email protected]>
Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM over all, but a couple of questions about implementation details inline.

and replaces itself with the result. The descriptor also behaves like a mapping,
so that code that uses ``cls.__dict__["__annotations__"]`` will still usually
work: treating the object as a mapping will evaluate the annotations and behave
as if the descriptor itself was the annotations dictionary. (Code that assumes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the implicit mapping access also update the cls.__dict__ entry? (thanks to the owner information passed to __set_name__, it could).

If it doesn't, the specification should be explicit that the underlying dict will be cached on the descriptor instance, so it can be added to cls.__dict__ later and any changes made via the descriptor's mapping API will still be visible.

Copy link
Member

@AlexWaygood AlexWaygood Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a pure-Python proof of concept for the idea here: https://gist.github.com/AlexWaygood/29e386e092377fb2e288620df1765ed5

The PoC does not currently update the __dict__ entry -- it just caches the materialized annotations internally in the descriptor instance -- but I think you're right that it could, potentially.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a sentence that makes my proposal more explicit.

retrieve annotations.

Alex Waygood has suggested an approach that avoids these problems. On class
creation, ``cls.__dict__["__annotations__"]`` is set to a special descriptor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do type.__annotate__ and type.__annotations__ still exist in this approach? (I don't believe they're needed, but best to be explicit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that I don't think they'd be needed with this approach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they may be needed for non-heap types. I will experiment with this when I implement the idea.

class object returns ``None`` if the class has no annotations, and an annotation
function that returns the class's annotations if it does have annotations.

Classes always contain ``__annotations__`` and ``__annotate__`` keys in their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "classes" here mean all type instances? Or specifically heap types created with a class statement? What about the equivalent Python and C APIs?

I suspect only types created via a class statement getting these descriptors implicitly will make the most sense, with static types and types created via the dynamic Python and C APIs getting __annotate__ = None and __annotations__ = {} if nothing else is specified in the supplied class dicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All classes that have >=1 class with annotations in their MRO must have "__annotations__" and "__annotate__" keys in their class dictionaries under this proposal. Theoretically I think it would be fine to omit these keys from the class dictionary if you could reliably determine that no classes in the MRO have any annotations. I don't know if that's worth the complexity, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All classes that have >=1 class with annotations in their MRO

Or in the MRO of a metaclass.

Does "classes" here mean all type instances? Or specifically heap types created with a class statement? What about the equivalent Python and C APIs?

I think you're right and this should apply to classes created through the class statement, and classes created in ways that may inherit from Python classes should simply set the fields to None and {}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All classes that have >=1 class with annotations in their MRO

Or in the MRO of a metaclass.

right, yeah

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

peps/pep-0749.rst Outdated Show resolved Hide resolved
peps/pep-0749.rst Outdated Show resolved Hide resolved
peps/pep-0749.rst Outdated Show resolved Hide resolved
class object returns ``None`` if the class has no annotations, and an annotation
function that returns the class's annotations if it does have annotations.

Classes always contain ``__annotations__`` and ``__annotate__`` keys in their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All classes that have >=1 class with annotations in their MRO must have "__annotations__" and "__annotate__" keys in their class dictionaries under this proposal. Theoretically I think it would be fine to omit these keys from the class dictionary if you could reliably determine that no classes in the MRO have any annotations. I don't know if that's worth the complexity, though.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@carljm
Copy link
Member

carljm commented Jun 19, 2024

After sleeping on this and looking at the proof of concept in pure Python, I'm a bit concerned about the performance implications. Classes are already large, but adding a new object instance with several fields to every class that exists could still have serious memory consequences for large Python codebases. I think we need to consider whether the bugs that are being fixed here are serious enough in practice to justify this overhead, or whether there are simpler solutions that don't fix everything perfectly but address the biggest problems, and don't have this overhead.

@JelleZijlstra
Copy link
Member Author

@carljm it would be a small proportion of the overall size of a class object. In the prototype I'm currently working on, the size of the descriptor object is just 48 bytes, only 3% of the size of the class object:

>>> class X: a: int
... 
>>> ann = X.__dict__["__annotations__"]
>>> ann.get("a")
<class 'int'>
>>> ann.get("b")
>>> import sys
>>> sys.getsizeof(X)
1712
>>> sys.getsizeof(ann)
48

@JelleZijlstra
Copy link
Member Author

python/cpython#120719 has a very partial prototype now (I haven't gotten around to testing it yet or integrating descriptor behavior).

@carljm
Copy link
Member

carljm commented Jun 19, 2024

What if we never set __annotations__ in the dictionary of any subclass of type? This would mean that introspecting annotations on a metaclass would always have to re-run __annotate__, but I think it would solve the problems described here?

Or alternatively, if we used the "cache annotations under a different name than __annotations__" strategy, but only for subclasses of type.

@ncoghlan
Copy link
Contributor

The concern with the "cache under a different name strategy" is that it isn't fully backwards compatible: retrieving the annotations directly from the class dict wouldn't work anymore.

@JelleZijlstra
Copy link
Member Author

The concern with the "cache under a different name strategy" is that it isn't fully backwards compatible: retrieving the annotations directly from the class dict wouldn't work anymore.

That's also true for the original text of PEP 649. It says:

Class and module objects must cache the annotations dict in their dict, using the key annotations. This is required for backwards compatibility reasons.

But that's not really true under PEP 649 as written: __dict__["__annotations__"] only exists if you previously warmed the cache by accessing the .__annotations__ descriptor. Arguably that's worse than removing __annotations__ completely from the dict: now whether or not __dict__["__annotations__"] works is path-dependent.

This makes me think we should either use a strategy where __dict__["__annotations__"] continues to work reliably (Alex's proposal, now in this PR), or we should go for a clean break and never set __annotations__ in the dict. If so, we should probably do the same for modules.

It also occurred to me that we may want to avoid using the class dict as storage for __annotate__. In the future, we may want to make optimizations so that __annotate__ is not stored as a function, but as just a code object that gets lazily materialized into a function. If we guarantee that cls.__dict__["__annotate__"] is a function, that optimization becomes harder to achieve.

Or alternatively, if we used the "cache annotations under a different name than __annotations__" strategy, but only for subclasses of type.

We could do something like this, but I'd rather have all classes work similarly, instead of special-casing some categories of classes.

@AlexWaygood
Copy link
Member

AlexWaygood commented Jun 20, 2024

In the proof-of-concept gist currently, @carljm is correct that every class -- even if it has no annotations -- gets a unique descriptor instance in its class dictionary:

class Object:
    """Pretend this is how builtins.object would work"""

    __annotate__ = None
    annotations = {}

    def __init_subclass__(cls):
        cls.annotations = _AnnotationsDescriptor(cls)
        if "__annotate__" not in cls.__dict__:
            cls.__annotate__ = None

But this does indeed seem unnecessary for classes that have no annotations. Instead, we could have a single descriptor instance in the class dictionary for builtins.object that is reused for all classes with no annotations:

class Object:
    """Pretend this is how builtins.object would work"""

    __annotate__ = None

    def __init_subclass__(cls):
        if "__annotate__" in cls.__dict__:
            cls.annotations = _AnnotationsDescriptor(cls)
        else:
            cls.__annotate__ = None
            cls.annotations = Object.__dict__["annotations"]


Object.annotations = _AnnotationsDescriptor(Object)

This should result in dramatically fewer descriptor instances being produced at class creation time. We would still create a fresh descriptor instance for every class that does have annotations and insert that instance into the class dictionary; but this is not so different to the way that an __annotations__ dictionary is inserted into the class dictionary for these classes in Python 3.13 anyway.

@JelleZijlstra
Copy link
Member Author

@AlexWaygood that becomes problematic if the __annotations__ dict is modified. I can see a few options:

  • There is a single shared dict used by all unannotated classes. But now modifying that dict on one class affects all other unannotated classes.
  • Every time the descriptor is accessed, it returns a fresh dict. But now if you modify a class's __annotations__, the changes won't be preserved if you access __annotations__ again.
  • There is a lazily-created annotations dict for every class. But that means we need a per-class place to put it, and we're back where we started.
  • Instead of a mutable dict we return an immutable mappingproxy. Probably the best option, but still breaks use cases that rely on modifying the annotations dict.

@carljm
Copy link
Member

carljm commented Jun 20, 2024

It also occurred to me that we may want to avoid using the class dict as storage for __annotate__. In the future, we may want to make optimizations so that __annotate__ is not stored as a function, but as just a code object that gets lazily materialized into a function. If we guarantee that cls.__dict__["__annotate__"] is a function, that optimization becomes harder to achieve.

This is a really good point. That optimization was already implemented in the prototype implementation of PEP 649 that was available at the time of its acceptance, and although the optimization wasn't specified in the text of PEP 649, it was discussed publicly in advocating for the PEP prior to its acceptance, and I think it may turn out to be a valuable optimization. So it would be great if we can avoid closing the door to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants