-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stubgen & pybind11: can we make them play nice(r)? #16306
Comments
Annotated[Any, CppTypePybind11("cpp_namespace::UserType")]
Could you please take a look at the issue? |
Happy to help. I saw that my new changes to @rwgk If possible, it would be great if you could test mypy 1.7 as there is a chance this problem is fixed. If it's difficult to get updated tooling installed inside Google, that's ok, my suggestion below should be fine on its own: The fastest way to resolve this will be for you to modify the mypy pybind11 demo source code to demonstrate your problem and make a PR with the changes. I'll then be able to much more easily improve the stubgen code to resolve the issue. |
Trying. (It's not easy; I have to run global testing and fix anything that breaks. Might take a while.)
I could add to that file, but asking first: Are you set up to test in combination with an open pybind11 PR (pybind/pybind11#4888)? Or alternatively, maybe better: Could you point me to the commands I'd need to run to test myself, in my fork of mypy? |
No. But maybe there's an even simpler way to cooperate on this. Can you provide here the exact docstring that would be inspected from a function with this new capability? I can write a simpler unit test just focused on the string parsing. |
I should add that if the annotation in the docstring parser could be tripped up by use of parentheses in the types: |
I tried to get a docstring for bindings in the wild, but no luck in the time I can give it right now. I'll try again asap. I don't know a lot about typing. My main idea is that we change pybind11 so that mypy is not tripped up by a raw C++ name (e.g. Would this be better?
|
Why is it nested to begin with? |
|
With regard to nesting, So, if the Iterator[Annotated[Any, "cpp_namespace::UserType"]] Or if the iterator of strings represents the UserType, this would be correct: Annotated[Iterator[str], "cpp_namespace::UserType"]] Or if the intent was that it's an iterator of a tuple of Iterator[tuple[str, Annotated[Any, "cpp_namespace::UserType"]]] |
Thanks! That's exactly what I needed to know. I'll work on constructing pybind11 unit tests (in my pybind11 PR), then share the docstrings here. |
It would be a bug in the parser, then. It's not uncommon to have arbitrary expressions in Note that pybind11 already uses
Bare-string annotations are less clear and always would be troublesome for a third-party tool that works with annotations. I agree, that having a valid importable class is even better. The following annotation for a raw C++ type should be perfect. Annotated[Any, pybind11.CppType("cpp_namespace::UserType")] |
Unfortunately it looks like it isn't. I noticed that by updating from pybind11 2.9.2 to 2.11.1 which has introduced the wrapping into Given the following pybind11 example: #include <array>
#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
namespace py = pybind11;
std::array<float, 4> funcReturningArray()
{
return std::array<float, 4>{1.0, 2.0, 3.0, 4.0};
}
std::vector<float> funcReturningVector()
{
return std::vector<float>{1.0, 2.0, 3.0, 4.0};
}
PYBIND11_MODULE(my_native_module, m)
{
m.def("func_returning_array", &funcReturningArray);
m.def("func_returning_vector", &funcReturningVector);
} I've analyzed the stub generator output for the following matrix of pybind11 and mypy versions: Output from from typing import List
def func_returning_array() -> List[float[4]]: ...
def func_returning_vector() -> List[float]: ... The output Output from def func_returning_array() -> List[float[4]]: ...
def func_returning_vector() -> List[float]: ... Similar function signatures, but note that the import for Output from from typing import Any, List
def func_returning_array(*args, **kwargs) -> Any: ...
def func_returning_vector() -> List[float]: ... Unfortunately the stub generator gets confused by the underlying Output from def func_returning_array(*args, **kwargs): ...
def func_returning_vector() -> List[float]: ... Unfortunately the output with the latest mypy is similarly broken (the missing For the record, if I inspect |
Material to inform python/mypy#16306
@chadrik I worked quite a bit more on pybind/pybind11#4888, with this file as the intended interface for collaborating: The idea is that you can pretty easily extract the pybind11 docstrings from there for testing stubgen. Does that work for you? I don't have opinions about the exact rendering details. I mostly care about stability and loss-free stubgen processing. I'll happily change my PR to go with preferences from the stubgen team. Some simple ideas on my mind:
I copied the pybind11 demo source code into my PR, with link and copyright notice. Is that OK? |
I'd say it is almost easier to add more test cases to these pybind11 test cases, because mypy's CI already verifies that the "pybind11 => mypy stubgen" step results in the expected output (the two subfolders containing You can even open a dummy PR on the mypy repo, and temporarily switch pybind11 from an official version to some git-based hash within the pybind11 here (at least I'd expect replacing
By running the CI you'd see how the pybind11 change would affect the stub generator test outputs. In general it would make sense to bump the version there to 2.11.1. |
The MR that was merged prior to 1.7 unified the code and many of the behaviors of the C-extension and pure-python
This is another example of the C-extension behavior being adjusted to match the pure-python behavior (by sharing code paths). The C-extension behavior now correctly distinguishes types that are known to be It would be reasonable to add a flag to use def func_returning_array(*args: Any, **kwargs: Any) -> Any: ...
The parser, which I'm not responsible for writing, likely failed to find what it considered a valid signature, so the whole thing was thrown out. Totally agree that the parser should be improved. |
We have two projects,
Note (I'm not sure if you saw already): I copied basically all of pybind11_mypy_demo/src/main.cpp into pybind11/tests/test_cases_for_stubgen.cpp, with this comment:
I definitely want to keep test_cases_for_stubgen.cpp on the pybind11 side. That's the only easy way for us to know quickly when we're changing the docstrings, intentionally or unintentionally. Then the question is: do you want to copy back the file and build yourself? I think that will not work very well, because over time there will be changes almost certainly, and some may only make sense, or even only build, with a matching pybind11 version. I think all we (collectively) really care about is that the docstrings produced by pybind11 are processed gracefully by stubgen. How and why exactly pybind11 produces certain docstrings doesn't really matter to stubgen. You're of course free to maintain your own infrastructure for including certain pybind11 versions in your testing, but I think this will be easier overall:
This way the stubgen developers can see easily if/when they are breaking compatibility with old pybind11 versions. (I assume it's impractical for stubgen to test with multiple pybind11 versions N years back.) I'd be happy to open a mypy PR to establish something like test_cases_from_pybind11.txt. |
I worked on pybind/pybind11#4888 some more. I added a few torture tests for recursive processing, and made the test itself much nicer. What you see below is copy-pasted directly from the current test_cases_for_stubgen.py. I changed my mind slightly: You could copy the below into test_cases_from_pybind11.py, with a versioned name, e.g. Equivalent of #16306 (comment), but with bare TEST_CASES = {
"m.basics.answer.__doc__": 'answer() -> int\n\nanswer docstring, with end quote"\n',
"m.basics.sum.__doc__": "sum(arg0: int, arg1: int) -> int\n\nmultiline docstring test, edge case quotes \"\"\"'''\n",
"m.basics.midpoint.__doc__": "midpoint(left: float, right: float) -> float\n",
"m.basics.weighted_midpoint.__doc__": "weighted_midpoint(left: float, right: float, alpha: float = 0.5) -> float\n",
"m.basics.Point.__init__.__doc__": "__init__(*args, **kwargs)\nOverloaded function.\n\n1. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point) -> None\n\n2. __init__(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> None\n",
"m.basics.Point.distance_to.__doc__": "distance_to(*args, **kwargs)\nOverloaded function.\n\n1. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, x: float, y: float) -> float\n\n2. distance_to(self: pybind11_tests.cases_for_stubgen.basics.Point, other: pybind11_tests.cases_for_stubgen.basics.Point) -> float\n",
"m.basics.Point.length_unit.__doc__": "Members:\n\n mm\n\n pixel\n\n inch",
"m.basics.Point.angle_unit.__doc__": "Members:\n\n radian\n\n degree",
"m.pass_user_type.__doc__": 'pass_user_type(arg0: Annotated[Any, "test_cases_for_stubgen::UserType"]) -> None\n',
"m.return_user_type.__doc__": 'return_user_type() -> Annotated[Any, "test_cases_for_stubgen::UserType"]\n',
"m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",
"m.MapIntUserType.values.__doc__": 'values(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ValuesView[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
"m.MapIntUserType.items.__doc__": 'items(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.ItemsView[int, Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
"m.MapUserTypeInt.keys.__doc__": 'keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.KeysView[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
"m.MapUserTypeInt.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ValuesView[int]\n",
"m.MapUserTypeInt.items.__doc__": 'items(self: pybind11_tests.cases_for_stubgen.MapUserTypeInt) -> pybind11_tests.cases_for_stubgen.ItemsView[Annotated[Any, "test_cases_for_stubgen::UserType"], int]\n',
"m.MapFloatUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[float]\n",
"m.MapFloatUserType.values.__doc__": 'values(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
"m.MapFloatUserType.__iter__.__doc__": '__iter__(self: pybind11_tests.cases_for_stubgen.MapFloatUserType) -> Iterator[tuple[float, Annotated[Any, "test_cases_for_stubgen::UserType"]]]\n',
"m.MapUserTypeFloat.keys.__doc__": 'keys(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[Annotated[Any, "test_cases_for_stubgen::UserType"]]\n',
"m.MapUserTypeFloat.values.__doc__": "values(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[float]\n",
"m.MapUserTypeFloat.__iter__.__doc__": '__iter__(self: pybind11_tests.cases_for_stubgen.MapUserTypeFloat) -> Iterator[tuple[Annotated[Any, "test_cases_for_stubgen::UserType"], float]]\n',
"m.pass_std_array_int_2.__doc__": "pass_std_array_int_2(arg0: Annotated[list[int], FixedSize(2)]) -> None\n",
"m.return_std_array_int_3.__doc__": "return_std_array_int_3() -> Annotated[list[int], FixedSize(3)]\n",
"m.nested_case_01a.__doc__": "nested_case_01a(arg0: list[Annotated[list[int], FixedSize(2)]]) -> None\n",
"m.nested_case_02a.__doc__": 'nested_case_02a(arg0: list[Annotated[Any, "test_cases_for_stubgen::UserType"]]) -> None\n',
"m.nested_case_03a.__doc__": 'nested_case_03a(arg0: dict[Annotated[list[int], FixedSize(2)], Annotated[Any, "test_cases_for_stubgen::UserType"]]) -> None\n',
"m.nested_case_04a.__doc__": 'nested_case_04a(arg0: dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]]) -> None\n',
"m.nested_case_05a.__doc__": 'nested_case_05a(arg0: list[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]]]) -> None\n',
"m.nested_case_06a.__doc__": 'nested_case_06a(arg0: dict[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]], list[dict[list[Annotated[list[int], FixedSize(2)]], list[Annotated[Any, "test_cases_for_stubgen::UserType"]]]]]) -> None\n',
} |
In principle it would be possible to run the CI check against multiple versions of pybind11 as well. At least a "minimal version" and a "latest version". The test executes in seconds, so runtime shouldn't be the issue. It will mainly make the test setup a bit more complicated. However what you are proposing also has its beauty. The test cases would be much more lightweight. And the approach could be extended to other C-extension generators more easily, i.e., we could have a set of The main question is if the architecture of the stub generator allows for testing on that level, i.e., if the "inspection" is decoupled sufficient from the "generation" part. If so, that should work out nicely. What would be important imho is that the test cases also check for side effects in the generation, e.g., if encountering a certain symbol leads to the addition of the proper import etc. Then the tests should be a reasonable approximation of testing everything end-to-end. |
That sounds like it is a different "unit" of stubgen to test. Is it valuable to have pybind11 in the loop for that? If yes, I'd be interested in working together on adding a pybind11 unit test specifically for your purposes (in another pybind11 PR). Roughly, a unit test that works in the pybind11 GHA as usual, but also plays nicely with your GHA. So you would ideally/maybe not have any pybind11-specific C++ code on your side, but rely on what comes with pybind11. — I'm guessing you'll only ever really care about the latest version of pybind11 for this kind of test. |
@chadrik and @sizmailov there seems to be an open question about Speaking as a pybind11 developer, but totally not a typing expert, I like the explicit How much extra trouble does it cause on the stubgen side to process the Tiny brain dump about a detail: just The |
I'm 100% with the
At the same time, I don't see any downsides here apart from the trouble of implementing a proper annotation parser in stubgen. |
A little bit of extra work on pybind/pybind11#4888: It's now easy (one line change) to change between producing a bare string or Equivalent of #16306 (comment), but with
|
Below is the equivalent of #16306 (comment), but with
instead of the very verbose With this, the pybind11 docstrings are more human-readable, but there is also no ambiguity: What is a Python name, and what is a C++ name? I.e. I believe it should be easy for stubgen to expand the names in backticks to proper Python type annotation syntax, whatever that may be in the future. (This idea goes back to pybind/pybind11#4972. There we're coming to the conclusion that we currently don't have the free energy to cleanly produce Python type names, but a mix of Python type names and C++ type names.)
|
Any thoughts on pybind/pybind11#5167 ? (Specifically, I think the question is if it's better to allow stubgen to handle TypeVar, or if Python 3.12 syntax should be used). |
Bug Report
We're experimenting under pybind/pybind11#4888 with a pybind11 behavior change, to generate
instead of a bare
cpp_namespace::UserType
.It seems like mypy does handle outer-level
Annotated
, but not nested ones, e.g.The results from stubgen (1.5.1, Google-internal toolchain) are like this:
Desired is:
Does this ring any bells?
If you need more details, please let me know. (Note though that I don't know a lot about typing and mypy. This fell into my lap as Google-internal pybind11 owner & pybind11 maintainer on github.)
A more general question: What are your thought about changing the pybind11 behavior in this way? Is there a better way?
(A clear and concise description of what the bug is.)
To Reproduce
Expected Behavior
Actual Behavior
Your Environment
mypy.ini
(and other config files):The text was updated successfully, but these errors were encountered: