Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: ensure different KeysView/ValuesView type names for int16_t/int32_t or float/double. Fix #4529 #4972

Closed
wants to merge 5 commits into from

Conversation

hczhai
Copy link
Contributor

@hczhai hczhai commented Dec 14, 2023

Description

This PR fixes issue #4529. The problem was introduced in PR #4353.

The problem is the following:

// leads to ImportError: generic_type: cannot initialize type "KeysView[int]": an object with that name is already defined
py::bind_map<map<int, int>>(m, "map_int_int");
py::bind_map<map<long long int, int>>(m, "map_long_int");

Here we want type(map_int_int.keys()) to be KeysView[int] and type(map_long_int.keys()) to be KeysView[long]. But in after PR #4353 they are all assigned the same type name KeysView[int] which is caused by the usage of detail::make_caster<KeyType>::name for determining the type name. It maps all C++ integer types (int16_t, int32_t, ...) to the same python type name int and C++ float/double types to the same name float, etc.

The solution is to always use the C++ type name in KeyView[...] and ValuesView[...].

Note that before PR #4353, the type names were KeysView[map_int_int] and KeysView[map_long_int] for the above example (which is okay).

Suggested changelog entry:

Fix type name clash in ``KeysView`` and ``ValuesView`` when using multiple ``bind_map`` with different integer key types

@rwgk
Copy link
Collaborator

rwgk commented Dec 14, 2023

@aimir could you help reviewing this PR?

Here we want type(map_int_int.keys()) to be KeysView[int] and type(map_long_int.keys()) to be KeysView[long]. But in after PR #4353 they are all assigned the same type name KeysView[int] which is caused by the usage of detail::make_caster<KeyType>::name for determining the type name. It maps all C++ integer types (int16_t, int32_t, ...) to the same python type name int and C++ float/double types to the same name float, etc.

I'm super sleep-deprived at the moment ... but anyway: it sounds like the right thing to do? Instead of less of it (this PR), could we alternatively do more of it?

Concretely, could we work on this condition instead?

    // Wrap KeysView[KeyType] if it wasn't already wrapped
    if (!detail::get_type_info(typeid(KeysView))) {

Please don't hold back letting me know if you think this line of thinking is completely off the tracks. I'll look again after catching some sleep.

@hczhai
Copy link
Contributor Author

hczhai commented Dec 14, 2023

Hi @rwgk thanks for your quick feedback.

Concretely, could we work on this condition instead?

This line seems to be not related to the string name for the python binding.

This particular line is also mentioned in issue #4529, and after mentioning this line the issue author also wrote "but in this case two different typeids have the same string name in pybind11". To make it clearer, the problem is only related to the "same string name" of the two different types, which is, a naming problem. The behavior that there are "two different typeids" is correct. So issue #4529 cannot be fixed by changing this if condition.

Instead of less of it (this PR), could we alternatively do more of it?

Let me provide some more rationale for why I have to deleted the lines with detail::make_caster for inferring the type name (which were introduced in PR #4353). The deleted lines and added lines are all about the proper type names for the KeysView and ValuesView containers. Since these are containers, the names are inferred from the type names of their elements, which are stored in string variables key_type_name and mapped_type_name.

Now let's say we keep the lines from PR #4353, which basically says key_type_name = detail::make_caster<KeyType>::name.text. Note that detail::make_caster is about "converting a C++ type into a Python type", so the "name" obtained this way is the type name of the Python object after the conversion. Now consider the special case when this is not working: Since both C++ int and C++ long int types will be cast into same Python type int, we get the key_type_name = "int" for both KeyType = int and KeyType = long int. Then later in https://github.com/pybind/pybind11/pull/4972/files#diff-f022f150c6954fd97430527c1f9baeebd154648534b15b7bd747a408f5f1b9cdR726-R727, two different "KeysView" will be given the same name KeysView[int], and we get the error

ImportError: generic_type: cannot initialize type "KeysView[int]": an object with that name is already defined

To solve this problem, we should name the container for int as (something like) "KeysView[int]" and the container for long int as "KeysView[long]", this is essentially the same idea as we should do something like

py::bind_vector<vector<int32_t>>(m, "vector_int");
py::bind_vector<vector<int64_t>>(m, "vector_long");

rather than

py::bind_vector<vector<int32_t>>(m, ("vector_" + detail::make_caster<int32_t>::name.text).c_str()); // the type name will be "vector_int"
py::bind_vector<vector<int64_t>>(m, ("vector_" + detail::make_caster<int64_t>::name.text).c_str()); // the type name will be "vector_int" (but should actually be "vector_long")

which will cause the ImportError: generic_type: cannot initialize type "vector_int": an object with that name is already defined error.

In summary, for this naming purpose, the C++ to Python caster detail::make_caster is irrelevant so I think it is good to not keep these lines to get the correct behavior. Please let me know whether this explanation is clear.

@aimir Please correct me if I misunderstood something.

@rwgk
Copy link
Collaborator

rwgk commented Dec 15, 2023

What seems unfortunate is that we get a mix of Python type names and C++ type names.

See:

Specifically (from python/mypy#16306 (comment)):

    "m.MapIntUserType.keys.__doc__": "keys(self: pybind11_tests.cases_for_stubgen.MapIntUserType) -> pybind11_tests.cases_for_stubgen.KeysView[int]\n",

With this PR, we'd get e.g. KeysView[long], but the Python name long does not exist.

How could we make this work nice also for the purposes of stubgen?

KeysView[Annotated[Any, pybind11.CppType("long")]]?

Does a Python user even want to know that detail? I'm thinking KeysView[int] for any C++ type that's converted to Python int is better.

I think a better direction would be to class_-wrap only these once each:

  • template <typename KeyType>
    struct keys_view {
    virtual size_t len() = 0;
    virtual iterator iter() = 0;
    virtual bool contains(const KeyType &k) = 0;
    virtual bool contains(const object &k) = 0;
    virtual ~keys_view() = default;
    };
    template <typename MappedType>
    struct values_view {
    virtual size_t len() = 0;
    virtual iterator iter() = 0;
    virtual ~values_view() = default;
    };
    template <typename KeyType, typename MappedType>
    struct items_view {
    virtual size_t len() = 0;
    virtual iterator iter() = 0;
    virtual ~items_view() = default;
    };

Then add virtual std::string python_name() = 0; (key & value) or virtual std::pair<std::string, std::string> python_names() = 0; (items).

Using make_caster to get the Python names.

That's a bigger project, unfortunately. And probably I'm overlooking a thing or two.

... what do you think?

Copy link
Contributor

@aimir aimir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried to create wrapper names that would be pythonic in nature - specifically motivated by auto-generating typing hints for mypy using stubgen.

That's why I opted to always use the python type names - as far as python is concerned, those really are the same types. Unfortunately I didn't think about this leading to issues with multiple wrappers of different integer types.

@rwgk I agree that the right thing is to treat them as the same python wrapper, but make sure they are wraped only once - this would have resolved issue #4529 while keeping the typing info pythonic.

This might be a larger task that I'd love to work on, but I don't see myself having the time for it soon. The solution suggested by @hczhai seems like the right thing to do for now. This is a significant issue, and fixing this sooner rather than later is more important than having nice typing hints.

Comment on lines 322 to 323
assert map_string_double.values().__class__.__name__ == "ValuesView[double]"
assert map_string_float.values().__class__.__name__ == "ValuesView[float]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the tests for keys() and items() to make sure that those are also named as expected

@rwgk
Copy link
Collaborator

rwgk commented Dec 17, 2023

Thanks a lot @aimir for the feedback!

Looks good to me then, as far as I understand, what we'll have is still better than the state before #4353.

To deal with the mix of Python type names and C++ type names, this super simple idea crossed my mind:

python/mypy#16306 (comment)

(I.e. we can take care of that separately.)

@hczhai I'll approve this PR after you add the keys() and items() tests, as suggested by @aimir.

@hczhai
Copy link
Contributor Author

hczhai commented Dec 17, 2023

@rwgk @aimir Thanks a lot for the feedback!

@aimir The reason I deleted the tests for keys() and items():

In the example used for testing, we have MapStringDouble so the key type is the std::string. But the C++ type name (obtained from detail::type_info_description(typeid(KeyType))) for std::string is complicated. We will have to do (depending on the compiler)

assert (map_string_double.keys().__class__.__name__ in [
    "KeysView[std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >]",
    "KeysView[std::basic_string<char,std::char_traits<char>,std::allocator<char> >]",
    "KeysView[std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >]"
]

instead of assert map_string_double.keys().__class__.__name__ == "KeysView[str]". Is this okay? Maybe use another map without str keys for testing?

@rwgk So I am thinking of the alternative way that you suggested - namely, how to make it work with the python names. So if we class_-wrap keys_view<KeyType> once with the name KeysView[int] for all different int16_t, int32_t, ..., how to determine the template parameter KeyType here? I am thinking KeyType = detail::make_caster<CppKeyType>::py_type but py_type is only available for int and float (not available for tuple[int, int], etc.)

@rwgk
Copy link
Collaborator

rwgk commented Dec 17, 2023

@rwgk So I am thinking of the alternative way that you suggested - namely, how to make it work with the python names. So if we class_-wrap keys_view<KeyType> once with the name KeysView[int] for all different int16_t, int32_t, ..., how to determine the template parameter KeyType here? I am thinking KeyType = detail::make_caster<CppKeyType>::py_type but py_type is only available for int and float (not available for tuple[int, int], etc.)

I'm really not sure, but offering what comes to mind:

In my understanding, detail::make_caster<CppKeyType>::name is meant to be a Python name.

Now, that doesn't always hold true, but with the backtick trick I'm experimenting with under #4888, at least we're clearly exposing when we're falling back to a C++ type name. That's "good enough" IMO, hoping that stubgen will be changed in the future to process the backtick notation in a meaningful way.

Does that help?

@sizmailov for awareness.

@hczhai
Copy link
Contributor Author

hczhai commented Dec 17, 2023

After some experiments, I did not find any simple way for making KeysView, etc. pythonic... It seems to be some much larger amount of work. So maybe we have to use the ugly C++ type names for now.

@aimir @rwgk I have fixed the problems in the tests. Please see if they look good.

@rwgk
Copy link
Collaborator

rwgk commented Dec 18, 2023

But the C++ type name (obtained from detail::type_info_description(typeid(KeyType))) for std::string is complicated.

The goal of #4353 was to fix #3986. What we're getting with this PR doesn't solve the original problem. — Is that a fair assessment?

In any case, I initiated Google global testing with this PR, to see how many stubgen tests are failing, if any. I'll report back when I have the results.

@hczhai
Copy link
Contributor Author

hczhai commented Dec 19, 2023

@rwgk Thanks for the comments.

The goal of PR #4353 was to fix issue #3986. What we're getting with this PR doesn't solve the original problem. — Is that a fair assessment?

In my opinion, #3986 is still about "the right type signature" rather than a real bug. #4529 is a severe bug, generating runtime error messages and breaking many existing applications.

Nevertheless, based on our previous discussions, I found a working way to generate correct Python type names for KeysView, etc. See PR #4983, It is a slightly more complicated way, but should solve both #4529 and #3986 and is compatible with your #4888. I am happy to work on either this PR or PR #4983. But I do hope that we can fix the bug in #4529 soon.

@rwgk
Copy link
Collaborator

rwgk commented Dec 20, 2023

To report the results of the Google global testing with this PR:

There was only one failing test, comparing existing stubgen results with new results. The reduced diff is below. (For my own easy reference, the test ID was OCL:591943630:BASE:592407555:1703042759080:eb7a4a92.)

I'm not sure how many similar stubgen tests we have, my best guess is on the order of 100. — To put this in perspective, the total number of tests is many millions.

This regression is not great of course, but just from a practical viewpoint, it would be defendable.

-class ItemsView[int, object]:
+class ItemsView[long, object]:

-class ItemsView[str, object]:
+class ItemsView[std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>, object]:

-class KeysView[int]:
+class KeysView[long]:

-class KeysView[str]:
+class KeysView[std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>]:

-    def items(self) -> ItemsView[int,object]: ...
-    def keys(self) -> KeysView[int]: ...
+    def items(self) -> ItemsView[long,object]: ...
+    def keys(self) -> KeysView[long]: ...

-    def items(self) -> ItemsView[str,object]: ...
-    def keys(self) -> KeysView[str]: ...
+    def items(self, *args, **kwargs) -> Any: ...
+    def keys(self, *args, **kwargs) -> Any: ...

@rwgk
Copy link
Collaborator

rwgk commented Jan 13, 2024

Closing this after #4985 was merged.

@rwgk rwgk closed this Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants