Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stubgen & pybind11: can we make them play nice(r)? #16306

Open
rwgk opened this issue Oct 21, 2023 · 24 comments
Open

stubgen & pybind11: can we make them play nice(r)? #16306

rwgk opened this issue Oct 21, 2023 · 24 comments
Labels
bug mypy got something wrong topic-stubgen

Comments

@rwgk
Copy link

rwgk commented Oct 21, 2023

Bug Report
We're experimenting under pybind/pybind11#4888 with a pybind11 behavior change, to generate

Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]

instead of a bare cpp_namespace::UserType.

It seems like mypy does handle outer-level Annotated, but not nested ones, e.g.

Iterator[str, Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]

The results from stubgen (1.5.1, Google-internal toolchain) are like this:

def values(self, *args, **kwargs) -> Any: ...

Desired is:

def values(self) -> Iterator[str, Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]: ...

Does this ring any bells?

If you need more details, please let me know. (Note though that I don't know a lot about typing and mypy. This fell into my lap as Google-internal pybind11 owner & pybind11 maintainer on github.)

A more general question: What are your thought about changing the pybind11 behavior in this way? Is there a better way?

(A clear and concise description of what the bug is.)

To Reproduce

# Ideally, a small sample program that demonstrates the problem.
# Or even better, a reproducible playground link https://mypy-play.net/ (use the "Gist" button)

Expected Behavior

Actual Behavior

Your Environment

  • Mypy version used: 1.5.1, Google-internal toolchain
  • Mypy command-line flags:
  • Mypy configuration options from mypy.ini (and other config files):
  • Python version used: 3.10
@rwgk rwgk added the bug mypy got something wrong label Oct 21, 2023
@rwgk rwgk changed the title stubgen & Annotated[Any, CppTypePybind11("cpp_namespace::UserType")] stubgen & pybind11: can we make them play nice(r)? Oct 27, 2023
@sizmailov
Copy link
Contributor

@JelleZijlstra @chadrik

Could you please take a look at the issue?

@chadrik
Copy link
Contributor

chadrik commented Nov 11, 2023

Happy to help.

I saw that my new changes to stubgen were released in mypy 1.7, and those changes included some improvements to parsing recursive types.

@rwgk If possible, it would be great if you could test mypy 1.7 as there is a chance this problem is fixed. If it's difficult to get updated tooling installed inside Google, that's ok, my suggestion below should be fine on its own:

The fastest way to resolve this will be for you to modify the mypy pybind11 demo source code to demonstrate your problem and make a PR with the changes. I'll then be able to much more easily improve the stubgen code to resolve the issue.

@rwgk
Copy link
Author

rwgk commented Nov 13, 2023

@rwgk If possible, it would be great if you could test mypy 1.7 as there is a chance this problem is fixed.

Trying. (It's not easy; I have to run global testing and fix anything that breaks. Might take a while.)

The fastest way to resolve this will be for you to modify the mypy pybind11 demo source code to demonstrate your problem and make a PR with the changes.

I could add to that file, but asking first: Are you set up to test in combination with an open pybind11 PR (pybind/pybind11#4888)?

Or alternatively, maybe better: Could you point me to the commands I'd need to run to test myself, in my fork of mypy?

@chadrik
Copy link
Contributor

chadrik commented Nov 13, 2023

I could add to that file, but asking first: Are you set up to test in combination with an open pybind11 PR (pybind/pybind11#4888)?

No. But maybe there's an even simpler way to cooperate on this. Can you provide here the exact docstring that would be inspected from a function with this new capability? I can write a simpler unit test just focused on the string parsing.

@chadrik
Copy link
Contributor

chadrik commented Nov 13, 2023

I should add that if the annotation in the docstring parser could be tripped up by use of parentheses in the types: Iterator[str, Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]]. Is this something that you expect mypy or other type analyzers to understand? In order for this to work we would also need to generate an import statement for CppTypePybind11, I assume.

@rwgk
Copy link
Author

rwgk commented Nov 13, 2023

I tried to get a docstring for bindings in the wild, but no luck in the time I can give it right now. I'll try again asap.

I don't know a lot about typing. My main idea is that we change pybind11 so that mypy is not tripped up by a raw C++ name (e.g. cpp_namespace::UserType). I don't know (at all) what exactly is best. I'll go with your advice.

Would this be better?

Iterator[str, Annotated[Any, "cpp_namespace::UserType"]]

@bluenote10
Copy link
Contributor

It seems like mypy does handle outer-level Annotated, but not nested ones, e.g.

Why is it nested to begin with? Iterator only takes one generic argument, so Iterator[str, Annotated[Any, "cpp_namespace::UserType"]] doesn't seem to make sense as a type. Annotated[Iterator[str], "cpp_namespace::UserType"] would.

@chadrik
Copy link
Contributor

chadrik commented Nov 13, 2023

Annotated[Any, "cpp_namespace::UserType"] is preferable to Annotated[Any, CppTypePybind11("cpp_namespace::UserType")] unless there is an actual importable python type called CppTypePybind11, in which case you should include the module name and the new version of stubgen in mypy 1.7 should create the import statement. e.g. Annotated[Any, pybind11.CppTypePybind11("cpp_namespace::UserType")]

@chadrik
Copy link
Contributor

chadrik commented Nov 13, 2023

With regard to nesting, Annotated should wrap the type that represents cpp_namespace::UserType. @bluenote10 is correct that Iterator only takes one argument.

So, if the Any represents the UserType, then this would be correct:

Iterator[Annotated[Any, "cpp_namespace::UserType"]]

Or if the iterator of strings represents the UserType, this would be correct:

Annotated[Iterator[str], "cpp_namespace::UserType"]]

Or if the intent was that it's an iterator of a tuple of (str, Any), where the Any is the UserType, then this would be correct:

Iterator[tuple[str, Annotated[Any, "cpp_namespace::UserType"]]]

@rwgk
Copy link
Author

rwgk commented Nov 13, 2023

Or if the intent was that it's an iterator of a tuple of (str, Any), where the Any is the UserType, then this would be correct:

Iterator[tuple[str, Annotated[Any, "cpp_namespace::UserType"]]]

Thanks! That's exactly what I needed to know.

I'll work on constructing pybind11 unit tests (in my pybind11 PR), then share the docstrings here.

@sizmailov
Copy link
Contributor

I should add that if the annotation in the docstring parser could be tripped up by use of parentheses in the types

It would be a bug in the parser, then. It's not uncommon to have arbitrary expressions in Annotated[...]

Note that pybind11 already uses Annotated for fixed-sized arrays (pybind/pybind11#4679):
std::array<int, 2> is rendered as Annotated[List[int], FixedSize(2)]. (not sure if stubgen 1.7 is ok with that)

Annotated[Any, "cpp_namespace::UserType"] is preferable to Annotated[Any, CppTypePybind11("cpp_namespace::UserType")] unless there is an actual importable python type called CppTypePybind11,

Bare-string annotations are less clear and always would be troublesome for a third-party tool that works with annotations. I agree, that having a valid importable class is even better.

The following annotation for a raw C++ type should be perfect.

Annotated[Any, pybind11.CppType("cpp_namespace::UserType")]

@bluenote10
Copy link
Contributor

bluenote10 commented Nov 14, 2023

not sure if stubgen 1.7 is ok with that

Unfortunately it looks like it isn't. I noticed that by updating from pybind11 2.9.2 to 2.11.1 which has introduced the wrapping into Annotated.

Given the following pybind11 example:

#include <array>
#include <vector>

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>

namespace py = pybind11;

std::array<float, 4> funcReturningArray()
{
  return std::array<float, 4>{1.0, 2.0, 3.0, 4.0};
}

std::vector<float> funcReturningVector()
{
  return std::vector<float>{1.0, 2.0, 3.0, 4.0};
}

PYBIND11_MODULE(my_native_module, m)
{
  m.def("func_returning_array", &funcReturningArray);
  m.def("func_returning_vector", &funcReturningVector);
}

I've analyzed the stub generator output for the following matrix of pybind11 and mypy versions:

Output from pybind11==2.9.2 and mypy==1.6.1

from typing import List

def func_returning_array() -> List[float[4]]: ...
def func_returning_vector() -> List[float]: ...

The output List[float[4]] isn't really a valid type, for so far we could fix this issue on our side via a post-processing (just dropping the inner [...]) resulting in usable stubs.

Output from pybind11==2.9.2 and mypy==1.7.0

def func_returning_array() -> List[float[4]]: ...
def func_returning_vector() -> List[float]: ...

Similar function signatures, but note that the import for List is missing, which makes the resulting .pyi invalid by itself. This seems to be a rather fundamental regression of mypy 1.7.0. We actually observed a few more regressions in 1.7.0's stub gen, now tracked in #16486.

Output from pybind11==2.11.1 and mypy==1.6.1

from typing import Any, List

def func_returning_array(*args, **kwargs) -> Any: ...
def func_returning_vector() -> List[float]: ...

Unfortunately the stub generator gets confused by the underlying Annotated wrapping and messes up the signature of func_returning_array -- somehow not only on the output side, but also for the input types.

Output from pybind11==2.11.1 and mypy==1.7.0

def func_returning_array(*args, **kwargs): ...
def func_returning_vector() -> List[float]: ...

Unfortunately the output with the latest mypy is similarly broken (the missing -> Any is also unfortunate for users who want to run the their stub output against mypy with disallow_untyped_defs=True, but the signature is broken to begin with).


For the record, if I inspect my_native_module.func_returning_array.__doc__ at runtime, it returns 'func_returning_array() -> Annotated[List[float], FixedSize(4)]\n' as expected from the pybind11 perspective.

rwgk pushed a commit to rwgk/pybind11 that referenced this issue Nov 14, 2023
@rwgk
Copy link
Author

rwgk commented Nov 15, 2023

@chadrik I worked quite a bit more on pybind/pybind11#4888, with this file as the intended interface for collaborating:

https://github.com/pybind/pybind11/pull/4888/files#diff-947424b07ca441b29fe35dba0d43fbe29f235bade6d522895eed4cb899ead01e

The idea is that you can pretty easily extract the pybind11 docstrings from there for testing stubgen. Does that work for you?

I don't have opinions about the exact rendering details. I mostly care about stability and loss-free stubgen processing. I'll happily change my PR to go with preferences from the stubgen team.

Some simple ideas on my mind:

  • You don't necessarily have to build and run pybind11 yourself, but maybe just harvest test_cases_for_stubgen.py on a regular basis.
  • pybind11 behavior changes will be obvious, from the git history of that file.
  • We can add more tests over time as we run into and solve issues.

I copied the pybind11 demo source code into my PR, with link and copyright notice. Is that OK?

@bluenote10
Copy link
Contributor

The idea is that you can pretty easily extract the pybind11 docstrings from there for testing stubgen. Does that work for you?

I'd say it is almost easier to add more test cases to these pybind11 test cases, because mypy's CI already verifies that the "pybind11 => mypy stubgen" step results in the expected output (the two subfolders containing .pyi there -- we could perhaps rename them to output_with_docs and output_without_docs to make that more obvious). The more test cases we have there, the more confidence there is that the two play together nicely, and we could prevent regressions.

You can even open a dummy PR on the mypy repo, and temporarily switch pybind11 from an official version to some git-based hash within the pybind11 here (at least I'd expect replacing pybind11==2.9.2 with some git+ssh://... should work):

By running the CI you'd see how the pybind11 change would affect the stub generator test outputs.

In general it would make sense to bump the version there to 2.11.1.

@chadrik
Copy link
Contributor

chadrik commented Nov 15, 2023

def func_returning_array() -> List[float[4]]: ...
def func_returning_vector() -> List[float]: ...

Similar function signatures, but note that the import for List is missing, which makes the resulting .pyi invalid by itself. This seems to be a rather fundamental regression of mypy 1.7.0. We actually observed a few more regressions in 1.7.0's stub gen, now tracked in #16486.

The MR that was merged prior to 1.7 unified the code and many of the behaviors of the C-extension and pure-python stubgen processes. In the pure-python code path, imports are only added for modules that have been imported: there is not specialize behavior to detect and import types whose names match those in the typing or typing_extensions modules. There are real-world projects that I use stubgen on (PySide2) which have classes that match the names of those in the typing module, and the automatic generation of imports broke my stubs. The simple solution to the problem is for docstrings to use native types (list, dict, set, etc), or specify the module (typing.List, typing.Dict), but I do think that it makes sense to add a new option to stubgen to automatically add typing imports for known types.

def func_returning_array(*args, **kwargs): ...
def func_returning_vector() -> List[float]: ...

Unfortunately the output with the latest mypy is similarly broken (the missing -> Any is also unfortunate for users who want to run the their stub output against mypy with disallow_untyped_defs=True, but the signature is broken to begin with).

This is another example of the C-extension behavior being adjusted to match the pure-python behavior (by sharing code paths). The C-extension behavior now correctly distinguishes types that are known to be Any (via annotations or docstrings), from those which are not known. This is a feature that ensures that stub authors can find and correct unknown types. In the case of unknown types, stubgen now consistently omits the type from the signature, or it uses the placeholder type _typeshed.Incomplete if the type can only be partially inferred, for example Iterator[_typeshed.Incomplete] in the case of a function that has a yield statement. Iterator[Any] would be over-specific in scenarios where the contained type cannot be inferred.

It would be reasonable to add a flag to use Any instead of Incomplete and generate them in cases that would otherwise be omitted, and with the new unified behavior, this flag would work for both C and pure-python generators. Such a hypothetical flag would likely produce output like this:

def func_returning_array(*args: Any, **kwargs: Any) -> Any: ...

Unfortunately the stub generator gets confused by the underlying Annotated wrapping and messes up the signature of func_returning_array -- somehow not only on the output side, but also for the input types.

The parser, which I'm not responsible for writing, likely failed to find what it considered a valid signature, so the whole thing was thrown out. Totally agree that the parser should be improved.

@rwgk
Copy link
Author

rwgk commented Nov 16, 2023

I'd say it is almost easier to add more test cases to these pybind11 test cases,

We have two projects,

  • for pybind11 it's easier to have the test cases meant for stubgen in the pybind11 unit test system,
  • for mypy it's maybe easier to go with the existing pybind11_mypy_demo.

Note (I'm not sure if you saw already): I copied basically all of pybind11_mypy_demo/src/main.cpp into pybind11/tests/test_cases_for_stubgen.cpp, with this comment:

// The `basics` code was copied from here (to have all test cases for stubgen in one place):
// https://github.com/python/mypy/blob/c6cb3c6282003dd3dadcf028735f9ba6190a0c84/test-data/pybind11_mypy_demo/src/main.cpp

I definitely want to keep test_cases_for_stubgen.cpp on the pybind11 side. That's the only easy way for us to know quickly when we're changing the docstrings, intentionally or unintentionally.

Then the question is: do you want to copy back the file and build yourself? I think that will not work very well, because over time there will be changes almost certainly, and some may only make sense, or even only build, with a matching pybind11 version.

I think all we (collectively) really care about is that the docstrings produced by pybind11 are processed gracefully by stubgen. How and why exactly pybind11 produces certain docstrings doesn't really matter to stubgen.

You're of course free to maintain your own infrastructure for including certain pybind11 versions in your testing, but I think this will be easier overall:

# If docstrings are changed or added here, open a github.com/python/mypy pull request to update test_cases_from_pybind11.txt accordingly.
  • On the mypy side, we could have a comment at the top of test_cases_from_pybind11.txt, e.g.
# Please open a github.com/pybind/pybind11 issue if any of the docstrings here are no longer supported by stubgen. 

This way the stubgen developers can see easily if/when they are breaking compatibility with old pybind11 versions. (I assume it's impractical for stubgen to test with multiple pybind11 versions N years back.)

I'd be happy to open a mypy PR to establish something like test_cases_from_pybind11.txt.

@rwgk
Copy link
Author

rwgk commented Nov 16, 2023

I worked on pybind/pybind11#4888 some more. I added a few torture tests for recursive processing, and made the test itself much nicer. What you see below is copy-pasted directly from the current test_cases_for_stubgen.py.

I changed my mind slightly: You could copy the below into test_cases_from_pybind11.py, with a versioned name, e.g. TEST_CASES_PR4888 for the below, then TEST_CASES_V2_12_0 etc. in the future. That way it's very obvious where the docstrings came from.

Equivalent of #16306 (comment), but with bare "test_cases_for_stubgen::UserType":

TEST_CASES = {
    "m.basics.answer.__doc__": 'answer() -> int\n\nanswer docstring, with end quote"\n',
    "m.basics.sum.__doc__": "sum(arg0: int, arg1: int) -> int\n\nmultiline docstring test, edge case quotes \"\"\"'''\n",
    "m.basics.midpoint.__doc__": "midpoint(left: float, right: float) -> float\n",
    "m.basics.weighted_midpoint.__doc__": "weighted_midpoint(left: float, right: float, alpha: float = 0.5) -> float\n",
    "m.basics.Point.__init__.__doc__": "__init__(*args, **kwargs)\nOverloaded function.\n\n1. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point) -> None\n\n2. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> None\n",
    "m.basics.Point.distance_to.__doc__": "distance_to(*args, **kwargs)\nOverloaded function.\n\n1. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> float\n\n2. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, other: pybind11_tests.cases_for_stubgen.basics.Point) -> float\n",
    "m.basics.Point.length_unit.__doc__": "Members:\n\n  mm\n\n  pixel\n\n  inch",
    "m.basics.Point.angle_unit.__doc__": "Members:\n\n  radian\n\n  degree",
    "m.pass_user_type.__doc__": 'pass_user_type(arg0: Annotated[Any, "test_cases_for_stubgen::UserType"]) -> None\n',
    "m.return_user_type.__doc__": 'return_user_type() -> Annotated[Any, "test_cases_for_stubgen::UserType"]\n',
    "m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",
    "m.MapIntUserType.values.__doc__": 'values(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ValuesView[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
    "m.MapIntUserType.items.__doc__": 'items(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ItemsView[int, Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
    "m.MapUserTypeInt.keys.__doc__": 'keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.KeysView[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
    "m.MapUserTypeInt.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ValuesView[int]\n",
    "m.MapUserTypeInt.items.__doc__": 'items(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ItemsView[Annotated[Any, "test_cases_for_stubgen::UserType"], int]\n',
    "m.MapFloatUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[float]\n",
    "m.MapFloatUserType.values.__doc__": 'values(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
    "m.MapFloatUserType.__iter__.__doc__": '__iter__(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[tuple[float, Annotated[Any, "test_cases_for_stubgen::UserType"]]]\n',
    "m.MapUserTypeFloat.keys.__doc__": 'keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
    "m.MapUserTypeFloat.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[float]\n",
    "m.MapUserTypeFloat.__iter__.__doc__": '__iter__(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[tuple[Annotated[Any, "test_cases_for_stubgen::UserType"], float]]\n',
    "m.pass_std_array_int_2.__doc__": "pass_std_array_int_2(arg0: Annotated[list[int], FixedSize(2)]) -> None\n",
    "m.return_std_array_int_3.__doc__": "return_std_array_int_3() -> Annotated[list[int], FixedSize(3)]\n",
    "m.nested_case_01a.__doc__": "nested_case_01a(arg0: list[Annotated[list[int], FixedSize(2)]]) -> None\n",
    "m.nested_case_02a.__doc__": 'nested_case_02a(arg0: list[Annotated[Any, "test_cases_for_stubgen::UserType"]]) -> None\n',
    "m.nested_case_03a.__doc__": 'nested_case_03a(arg0: dict[Annotated[list[int], FixedSize(2)], Annotated[Any, "test_cases_for_stubgen::UserType"]]) -> None\n',
    "m.nested_case_04a.__doc__": 'nested_case_04a(arg0: dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]]) -> None\n',
    "m.nested_case_05a.__doc__": 'nested_case_05a(arg0: list[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]]]) -> None\n',
    "m.nested_case_06a.__doc__": 'nested_case_06a(arg0: dict[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]], list[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]]]]) -> None\n',
}

@bluenote10
Copy link
Contributor

I assume it's impractical for stubgen to test with multiple pybind11 versions N years back.

In principle it would be possible to run the CI check against multiple versions of pybind11 as well. At least a "minimal version" and a "latest version". The test executes in seconds, so runtime shouldn't be the issue. It will mainly make the test setup a bit more complicated.

However what you are proposing also has its beauty. The test cases would be much more lightweight. And the approach could be extended to other C-extension generators more easily, i.e., we could have a set of TEST_CASES_PYBIND11_V<x.y.z> but also TEST_CASE_SOME_OTHER_C_EXTENSION_V<x.y.z>.

The main question is if the architecture of the stub generator allows for testing on that level, i.e., if the "inspection" is decoupled sufficient from the "generation" part. If so, that should work out nicely. What would be important imho is that the test cases also check for side effects in the generation, e.g., if encountering a certain symbol leads to the addition of the proper import etc. Then the tests should be a reasonable approximation of testing everything end-to-end.

@rwgk
Copy link
Author

rwgk commented Nov 16, 2023

What would be important imho is that the test cases also check for side effects in the generation, e.g., if encountering a certain symbol leads to the addition of the proper import etc.

That sounds like it is a different "unit" of stubgen to test. Is it valuable to have pybind11 in the loop for that?

If yes, I'd be interested in working together on adding a pybind11 unit test specifically for your purposes (in another pybind11 PR). Roughly, a unit test that works in the pybind11 GHA as usual, but also plays nicely with your GHA. So you would ideally/maybe not have any pybind11-specific C++ code on your side, but rely on what comes with pybind11. — I'm guessing you'll only ever really care about the latest version of pybind11 for this kind of test.

@rwgk
Copy link
Author

rwgk commented Nov 16, 2023

@chadrik and @sizmailov there seems to be an open question about "cpp_namespace::UserType" vs "pybind11.CppType("cpp_namespace::UserType").

Speaking as a pybind11 developer, but totally not a typing expert, I like the explicit pybind11.CppType(...) better. Why? As a core library developer, I'm often looking for needles in a heap of code I've never seen before and will never see again. Having the extra clue that "cpp_namespace::UserType" was generated by pybind11 could make the difference between debugging one hour or ten hours.

How much extra trouble does it cause on the stubgen side to process the pybind11.CppType(...)? Could that be a well-known/builtin thing (sorry, I don't know the correct term) in stubgen?

Tiny brain dump about a detail: just CppType() could also work, but would be a small loss, because what pybind11 produces is not necessarily what other tools might produce. There is some rather ad-hoc "clean" step here:

https://github.com/pybind/pybind11/blob/dc9b39596d986aeb061bd3debe52d30e2467dc48/include/pybind11/detail/typeid.h#L35-L49

The pybind11. prefix makes it clear that that was in the loop.

@sizmailov
Copy link
Contributor

I'm 100% with the pybind11.CppType("cpp_namespace::UserType") option.

  • It's unambiguous and crystal clear
  • No problems for type checkers (I assume we provide the pybind11.CppType in the python package)
  • If the parser supports expressions in annotations, there would be no special rule for pybind11.CppType("cpp_namespace::UserType"). We need that feature for FixedSize-like things anyway.

At the same time, I don't see any downsides here apart from the trouble of implementing a proper annotation parser in stubgen.

@rwgk
Copy link
Author

rwgk commented Nov 16, 2023

A little bit of extra work on pybind/pybind11#4888:

It's now easy (one line change) to change between producing a bare string or pybind11.CppType("..."). Below is the TEST_CASES dict with the latter.

Equivalent of #16306 (comment), but with pybind11.CppType("test_cases_for_stubgen::UserType"):

TEST_CASES = {
    "m.basics.answer.__doc__": 'answer() -> int\n\nanswer docstring, with end quote"\n',
    "m.basics.sum.__doc__": "sum(arg0: int, arg1: int) -> int\n\nmultiline docstring test, edge case quotes \"\"\"'''\n",
    "m.basics.midpoint.__doc__": "midpoint(left: float, right: float) -> float\n",
    "m.basics.weighted_midpoint.__doc__": "weighted_midpoint(left: float, right: float, alpha: float = 0.5) -> float\n",
    "m.basics.Point.__init__.__doc__": "__init__(*args, **kwargs)\nOverloaded function.\n\n1. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point) -> None\n\n2. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> None\n",
    "m.basics.Point.distance_to.__doc__": "distance_to(*args, **kwargs)\nOverloaded function.\n\n1. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> float\n\n2. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, other: pybind11_tests.cases_for_stubgen.basics.Point) -> float\n",
    "m.basics.Point.length_unit.__doc__": "Members:\n\n  mm\n\n  pixel\n\n  inch",
    "m.basics.Point.angle_unit.__doc__": "Members:\n\n  radian\n\n  degree",
    "m.pass_user_type.__doc__": 'pass_user_type(arg0: Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]) -> None\n',
    "m.return_user_type.__doc__": 'return_user_type() -> Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]\n',
    "m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",
    "m.MapIntUserType.values.__doc__": 'values(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ValuesView[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]\n',
    "m.MapIntUserType.items.__doc__": 'items(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ItemsView[int, Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]\n',
    "m.MapUserTypeInt.keys.__doc__": 'keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.KeysView[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]\n',
    "m.MapUserTypeInt.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ValuesView[int]\n",
    "m.MapUserTypeInt.items.__doc__": 'items(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ItemsView[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")], int]\n',
    "m.MapFloatUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[float]\n",
    "m.MapFloatUserType.values.__doc__": 'values(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]\n',
    "m.MapFloatUserType.__iter__.__doc__": '__iter__(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[tuple[float, Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]]\n',
    "m.MapUserTypeFloat.keys.__doc__": 'keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]\n',
    "m.MapUserTypeFloat.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[float]\n",
    "m.MapUserTypeFloat.__iter__.__doc__": '__iter__(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[tuple[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")], float]]\n',
    "m.pass_std_array_int_2.__doc__": "pass_std_array_int_2(arg0: Annotated[list[int], FixedSize(2)]) -> None\n",
    "m.return_std_array_int_3.__doc__": "return_std_array_int_3() -> Annotated[list[int], FixedSize(3)]\n",
    "m.nested_case_01a.__doc__": "nested_case_01a(arg0: list[Annotated[list[int], FixedSize(2)]]) -> None\n",
    "m.nested_case_02a.__doc__": 'nested_case_02a(arg0: list[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]) -> None\n',
    "m.nested_case_03a.__doc__": 'nested_case_03a(arg0: dict[Annotated[list[int], FixedSize(2)], Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]) -> None\n',
    "m.nested_case_04a.__doc__": 'nested_case_04a(arg0: dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]]) -> None\n',
    "m.nested_case_05a.__doc__": 'nested_case_05a(arg0: list[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]]]) -> None\n',
    "m.nested_case_06a.__doc__": 'nested_case_06a(arg0: dict[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]], list[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, pybind11.CppType("test_cases_for_stubgen::UserType")]]]]]) -> None\n',
}

@rwgk
Copy link
Author

rwgk commented Dec 17, 2023

Below is the equivalent of #16306 (comment), but with

`test_cases_for_stubgen::UserType`

instead of the very verbose Annotated[Any, pybind11.CppType("...")] syntax.

With this, the pybind11 docstrings are more human-readable, but there is also no ambiguity: What is a Python name, and what is a C++ name?

I.e. I believe it should be easy for stubgen to expand the names in backticks to proper Python type annotation syntax, whatever that may be in the future.

(This idea goes back to pybind/pybind11#4972. There we're coming to the conclusion that we currently don't have the free energy to cleanly produce Python type names, but a mix of Python type names and C++ type names.)

TEST_CASES = {
    "m.basics.answer.__doc__": 'answer() -> int\n\nanswer docstring, with end quote"\n',
    "m.basics.sum.__doc__": "sum(arg0: int, arg1: int) -> int\n\nmultiline docstring test, edge case quotes \"\"\"'''\n",
    "m.basics.midpoint.__doc__": "midpoint(left: float, right: float) -> float\n",
    "m.basics.weighted_midpoint.__doc__": "weighted_midpoint(left: float, right: float, alpha: float = 0.5) -> float\n",
    "m.basics.Point.__init__.__doc__": "__init__(*args, **kwargs)\nOverloaded function.\n\n1. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point) -> None\n\n2. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> None\n",
    "m.basics.Point.distance_to.__doc__": "distance_to(*args, **kwargs)\nOverloaded function.\n\n1. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> float\n\n2. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, other: pybind11_tests.cases_for_stubgen.basics.Point) -> float\n",
    "m.basics.Point.length_unit.__doc__": "Members:\n\n  mm\n\n  pixel\n\n  inch",
    "m.basics.Point.angle_unit.__doc__": "Members:\n\n  radian\n\n  degree",
    "m.pass_user_type.__doc__": "pass_user_type(arg0: `test_cases_for_stubgen::UserType`) -> None\n",
    "m.return_user_type.__doc__": "return_user_type() -> `test_cases_for_stubgen::UserType`\n",
    "m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",
    "m.MapIntUserType.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ValuesView[`test_cases_for_stubgen::UserType`]\n",
    "m.MapIntUserType.items.__doc__": "items(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ItemsView[int, `test_cases_for_stubgen::UserType`]\n",
    "m.MapUserTypeInt.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.KeysView[`test_cases_for_stubgen::UserType`]\n",
    "m.MapUserTypeInt.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ValuesView[int]\n",
    "m.MapUserTypeInt.items.__doc__": "items(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ItemsView[`test_cases_for_stubgen::UserType`, int]\n",
    "m.MapFloatUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[float]\n",
    "m.MapFloatUserType.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[`test_cases_for_stubgen::UserType`]\n",
    "m.MapFloatUserType.__iter__.__doc__": "__iter__(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[tuple[float, `test_cases_for_stubgen::UserType`]]\n",
    "m.MapUserTypeFloat.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[`test_cases_for_stubgen::UserType`]\n",
    "m.MapUserTypeFloat.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[float]\n",
    "m.MapUserTypeFloat.__iter__.__doc__": "__iter__(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[tuple[`test_cases_for_stubgen::UserType`, float]]\n",
    "m.pass_std_array_int_2.__doc__": "pass_std_array_int_2(arg0: Annotated[list[int], FixedSize(2)]) -> None\n",
    "m.return_std_array_int_3.__doc__": "return_std_array_int_3() -> Annotated[list[int], FixedSize(3)]\n",
    "m.nested_case_01a.__doc__": "nested_case_01a(arg0: list[Annotated[list[int], FixedSize(2)]]) -> None\n",
    "m.nested_case_02a.__doc__": "nested_case_02a(arg0: list[`test_cases_for_stubgen::UserType`]) -> None\n",
    "m.nested_case_03a.__doc__": "nested_case_03a(arg0: dict[Annotated[list[int], FixedSize(2)], `test_cases_for_stubgen::UserType`]) -> None\n",
    "m.nested_case_04a.__doc__": "nested_case_04a(arg0: dict[list[Annotated[list[int], FixedSize(2)]], list[`test_cases_for_stubgen::UserType`]]) -> None\n",
    "m.nested_case_05a.__doc__": "nested_case_05a(arg0: list[dict[list[Annotated[list[int], FixedSize(2)]], list[`test_cases_for_stubgen::UserType`]]]) -> None\n",
    "m.nested_case_06a.__doc__": "nested_case_06a(arg0: dict[dict[list[Annotated[list[int], FixedSize(2)]], list[`test_cases_for_stubgen::UserType`]], list[dict[list[Annotated[list[int], FixedSize(2)]], list[`test_cases_for_stubgen::UserType`]]]]) -> None\n",
}

@henryiii
Copy link
Contributor

Any thoughts on pybind/pybind11#5167 ? (Specifically, I think the question is if it's better to allow stubgen to handle TypeVar, or if Python 3.12 syntax should be used).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong topic-stubgen
Projects
None yet
Development

No branches or pull requests

6 participants