-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: make stl.h list|set|map_caster
more user friendly.
#4686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
d2a36d9
7d30378
039d1b7
bdba857
ec50ca0
eeb59af
3631886
b4f767b
42e22d7
9647a92
1ed90cf
3cc034b
18c4153
e7330f2
da29bf5
1fa338e
6b5c780
9150a88
62fda81
4b25f74
3cfc95b
59509f2
903faac
4ab40d7
5f7c9d4
ccfeb02
6c1ec6a
cc4d69c
dacb827
3fc2d2a
ff0724f
be02291
0a79f55
ae95424
8aa81d7
4435ed0
9aca5e8
9dc7d7c
d9eeea8
9ac319b
b834767
d19a0ff
69a0da8
c86fa97
940e190
f7725e4
fa72918
a6b9a00
ac3ac4d
87ff92a
3ae34f3
d1ffe4f
2267e5c
a38d0bd
fbf41fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
#include "detail/common.h" | ||
|
||
#include <deque> | ||
#include <initializer_list> | ||
#include <list> | ||
#include <map> | ||
#include <ostream> | ||
|
@@ -35,6 +36,74 @@ | |
PYBIND11_NAMESPACE_BEGIN(PYBIND11_NAMESPACE) | ||
PYBIND11_NAMESPACE_BEGIN(detail) | ||
|
||
// | ||
// Begin: Equivalent of | ||
// https://github.com/google/clif/blob/337b2a106c0552ad85d40cfc3797465756840ea0/clif/python/runtime.cc#L388-L424 | ||
/* | ||
The three `PyObjectTypeIsConvertibleTo*()` functions below are | ||
the result of converging the behaviors of pybind11 and PyCLIF | ||
(http://github.com/google/clif). | ||
|
||
Originally PyCLIF was extremely far on the permissive side of the spectrum, | ||
while pybind11 was very far on the strict side. Originally PyCLIF accepted any | ||
Python iterable as input for a C++ `vector`/`set`/`map` argument, as long as | ||
the elements were convertible. The obvious (in hindsight) problem was that | ||
any empty Python iterable could be passed to any of these C++ types, e.g. `{}` | ||
was accpeted for C++ `vector`/`set` arguments, or `[]` for C++ `map` arguments. | ||
|
||
The functions below strike a practical permissive-vs-strict compromise, | ||
informed by tens of thousands of use cases in the wild. A main objective is | ||
to prevent accidents and improve readability: | ||
|
||
- Python literals must match the C++ types. | ||
|
||
- For C++ `set`: The potentially reducing conversion from a Python sequence | ||
(e.g. Python `list` or `tuple`) to a C++ `set` must be explicit, by going | ||
through a Python `set`. | ||
|
||
- However, a Python `set` can still be passed to a C++ `vector`. The rationale | ||
is that this conversion is not reducing. Implicit conversions of this kind | ||
are also fairly commonly used, therefore enforcing explicit conversions | ||
would have an unfavorable cost : benefit ratio; more sloppily speaking, | ||
such an enforcement would be more annyoing than helpful. | ||
*/ | ||
|
||
inline bool PyObjectIsInstanceWithOneOfTpNames(PyObject *obj, | ||
std::initializer_list<const char *> tp_names) { | ||
if (PyType_Check(obj)) { | ||
return false; | ||
} | ||
const char *obj_tp_name = Py_TYPE(obj)->tp_name; | ||
for (const auto *tp_name : tp_names) { | ||
if (strcmp(obj_tp_name, tp_name) == 0) { | ||
return true; | ||
} | ||
} | ||
return false; | ||
} | ||
|
||
inline bool PyObjectTypeIsConvertibleToStdVector(PyObject *obj) { | ||
if (PySequence_Check(obj) != 0) { | ||
return !PyUnicode_Check(obj) && !PyBytes_Check(obj); | ||
} | ||
return (PyGen_Check(obj) != 0) || (PyAnySet_Check(obj) != 0) | ||
|| PyObjectIsInstanceWithOneOfTpNames( | ||
obj, {"dict_keys", "dict_values", "dict_items", "map", "zip"}); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This requires explicit subclasses, rather than any object that conforms to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specifically, from https://docs.python.org/3/library/collections.abc.html:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition to what I wrote yesterday, there was actually another concern in the back of my mind: the current implementation is super fast, direct In contrast, So there is a code complexity and runtime overhead cost. What is the benefit? That brings me back to my "large-scale field experiment" experience hinted in the PR description. In a nutshell:
I had this huge pile of data from the wild contrasting the two. Based on that I could see very clearly that there is very little to gain from being more permissive than what this PR implements. IOW I don't think paying the code complexity and runtime overhead cost will help in any significant way. Could you please let me know if you have a different opinion? — It will probably cost me a whole day worth of effort to implement and fully test (global testing) attribute inspection. We will also have to agree on what subsets of the attributes we actually want to test against. I expect that there will be some devils in the details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think having a short path (check inheritance first) could alleviate much of the speed problem - the slower "correct" check could be done if this is not a subclass of X for common X. Also, this list is somewhat arbitrary - you probably included things (like From a usage standpoint, implementing the Protocol specified by these ABC's is supposed to make your class work wherever the standard library classes work, so it seems ideal if this could follow that. Not required (and we could start with this if you really want to), but it seems like the correct thing to do.
This is concretely specified in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree. Which basically means: this PR plus follow-on work to make the check more permissive. Do you think it's reasonable then to merge this PR, and do the fallback attribute inspection in a separate PR? — To be completely honest, I wouldn't want to prioritize this, for the reasons explained before. It a significant amount of work, extra code complexity, and I don't think very many people will ever notice the difference in practice. But if someone else wants to invest the effort, I'd be on board.
I see, sounds good to me, but from a purely practical viewpoint, assuming that most duck-typing only implements a subset of the attributes, insisting on the full set of attributes here probably means nobody will ever see a difference in practice, unless they are specifically looking to pass the checks here. |
||
} | ||
|
||
inline bool PyObjectTypeIsConvertibleToStdSet(PyObject *obj) { | ||
return (PyAnySet_Check(obj) != 0) || PyObjectIsInstanceWithOneOfTpNames(obj, {"dict_keys"}); | ||
} | ||
|
||
inline bool PyObjectTypeIsConvertibleToStdMap(PyObject *obj) { | ||
return (PyDict_Check(obj) != 0) | ||
|| ((PyMapping_Check(obj) != 0) && (PyObject_HasAttrString(obj, "items") != 0)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to check the signature of items or that attrs is at least callable, right? Otherwise, it will try (and fail to convert a class with an items attr attached to it, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I didn't consider this before. PyCLIF doesn't have function overloading, it does not matter there. Adding the callable check is easy (done), but I don't know if there is a way to check if the callable can be called without arguments (without actually calling it). The conditions under which the missing check for number of arguments could go wrong are very special:
Only then will the desired overload will not be reached. Even if we can check for the number of arguments, The only truly certain alternative is to not have From memory: not having the On the other hand, having the requirement that an object with But in any event, adding the callable check is easy and I'm happy to keep that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just discovered Using that makes the "mapping" requirement more formal. Our gating conditions are stronger than those for |
||
} | ||
|
||
// | ||
// End: Equivalent of clif/python/runtime.cc | ||
// | ||
|
||
/// Extracts an const lvalue reference or rvalue reference for U based on the type of T (e.g. for | ||
/// forwarding a container element). Typically used indirect via forwarded_type(), below. | ||
template <typename T, typename U> | ||
|
@@ -66,24 +135,40 @@ struct set_caster { | |
} | ||
void reserve_maybe(const anyset &, void *) {} | ||
|
||
public: | ||
bool load(handle src, bool convert) { | ||
if (!isinstance<anyset>(src)) { | ||
return false; | ||
} | ||
auto s = reinterpret_borrow<anyset>(src); | ||
value.clear(); | ||
reserve_maybe(s, &value); | ||
for (auto entry : s) { | ||
bool convert_iterable(const iterable &itbl, bool convert) { | ||
for (auto it : itbl) { | ||
key_conv conv; | ||
if (!conv.load(entry, convert)) { | ||
if (!conv.load(it, convert)) { | ||
return false; | ||
} | ||
value.insert(cast_op<Key &&>(std::move(conv))); | ||
} | ||
return true; | ||
} | ||
|
||
bool convert_anyset(anyset s, bool convert) { | ||
value.clear(); | ||
reserve_maybe(s, &value); | ||
return convert_iterable(s, convert); | ||
} | ||
|
||
public: | ||
bool load(handle src, bool convert) { | ||
if (!PyObjectTypeIsConvertibleToStdSet(src.ptr())) { | ||
return false; | ||
} | ||
if (isinstance<anyset>(src)) { | ||
value.clear(); | ||
return convert_anyset(reinterpret_borrow<anyset>(src), convert); | ||
} | ||
if (!convert) { | ||
return false; | ||
} | ||
assert(isinstance<iterable>(src)); | ||
value.clear(); | ||
return convert_iterable(reinterpret_borrow<iterable>(src), convert); | ||
} | ||
|
||
template <typename T> | ||
static handle cast(T &&src, return_value_policy policy, handle parent) { | ||
if (!std::is_lvalue_reference<T>::value) { | ||
|
@@ -115,12 +200,7 @@ struct map_caster { | |
} | ||
void reserve_maybe(const dict &, void *) {} | ||
|
||
public: | ||
bool load(handle src, bool convert) { | ||
if (!isinstance<dict>(src)) { | ||
return false; | ||
} | ||
auto d = reinterpret_borrow<dict>(src); | ||
bool convert_elements(const dict &d, bool convert) { | ||
value.clear(); | ||
reserve_maybe(d, &value); | ||
for (auto it : d) { | ||
|
@@ -134,6 +214,26 @@ struct map_caster { | |
return true; | ||
} | ||
|
||
public: | ||
bool load(handle src, bool convert) { | ||
if (!PyObjectTypeIsConvertibleToStdMap(src.ptr())) { | ||
return false; | ||
} | ||
if (isinstance<dict>(src)) { | ||
return convert_elements(reinterpret_borrow<dict>(src), convert); | ||
} | ||
if (!convert) { | ||
return false; | ||
} | ||
// Designed to be behavior-equivalent to passing dict(src.items()) from Python: | ||
// The conversion to a dict will first exhaust the iterator object, to ensure that | ||
// the iterator is not left in an unpredictable (to the caller) partially-consumed | ||
// state. | ||
auto items = src.attr("items")(); | ||
assert(isinstance<iterable>(items)); | ||
return convert_elements(dict(reinterpret_borrow<iterable>(items)), convert); | ||
} | ||
|
||
template <typename T> | ||
static handle cast(T &&src, return_value_policy policy, handle parent) { | ||
dict d; | ||
|
@@ -166,13 +266,35 @@ struct list_caster { | |
using value_conv = make_caster<Value>; | ||
|
||
bool load(handle src, bool convert) { | ||
if (!isinstance<sequence>(src) || isinstance<bytes>(src) || isinstance<str>(src)) { | ||
if (!PyObjectTypeIsConvertibleToStdVector(src.ptr())) { | ||
return false; | ||
} | ||
auto s = reinterpret_borrow<sequence>(src); | ||
if (isinstance<sequence>(src)) { | ||
return convert_elements(src, convert); | ||
} | ||
if (!convert) { | ||
return false; | ||
} | ||
// Designed to be behavior-equivalent to passing tuple(src) from Python: | ||
// The conversion to a tuple will first exhaust the generator object, to ensure that | ||
// the generator is not left in an unpredictable (to the caller) partially-consumed | ||
// state. | ||
assert(isinstance<iterable>(src)); | ||
return convert_elements(tuple(reinterpret_borrow<iterable>(src)), convert); | ||
} | ||
|
||
private: | ||
template <typename T = Type, enable_if_t<has_reserve_method<T>::value, int> = 0> | ||
void reserve_maybe(const sequence &s, Type *) { | ||
value.reserve(s.size()); | ||
} | ||
void reserve_maybe(const sequence &, void *) {} | ||
|
||
bool convert_elements(handle seq, bool convert) { | ||
auto s = reinterpret_borrow<sequence>(seq); | ||
value.clear(); | ||
reserve_maybe(s, &value); | ||
for (auto it : s) { | ||
for (auto it : seq) { | ||
value_conv conv; | ||
if (!conv.load(it, convert)) { | ||
return false; | ||
|
@@ -182,13 +304,6 @@ struct list_caster { | |
return true; | ||
} | ||
|
||
private: | ||
template <typename T = Type, enable_if_t<has_reserve_method<T>::value, int> = 0> | ||
void reserve_maybe(const sequence &s, Type *) { | ||
value.reserve(s.size()); | ||
} | ||
void reserve_maybe(const sequence &, void *) {} | ||
|
||
public: | ||
template <typename T> | ||
static handle cast(T &&src, return_value_policy policy, handle parent) { | ||
|
@@ -237,12 +352,8 @@ struct array_caster { | |
return size == Size; | ||
} | ||
|
||
public: | ||
bool load(handle src, bool convert) { | ||
if (!isinstance<sequence>(src)) { | ||
return false; | ||
} | ||
auto l = reinterpret_borrow<sequence>(src); | ||
bool convert_elements(handle seq, bool convert) { | ||
auto l = reinterpret_borrow<sequence>(seq); | ||
if (!require_size(l.size())) { | ||
return false; | ||
} | ||
|
@@ -257,6 +368,25 @@ struct array_caster { | |
return true; | ||
} | ||
|
||
public: | ||
bool load(handle src, bool convert) { | ||
if (!PyObjectTypeIsConvertibleToStdVector(src.ptr())) { | ||
return false; | ||
} | ||
if (isinstance<sequence>(src)) { | ||
return convert_elements(src, convert); | ||
} | ||
if (!convert) { | ||
return false; | ||
} | ||
// Designed to be behavior-equivalent to passing tuple(src) from Python: | ||
// The conversion to a tuple will first exhaust the generator object, to ensure that | ||
// the generator is not left in an unpredictable (to the caller) partially-consumed | ||
// state. | ||
assert(isinstance<iterable>(src)); | ||
return convert_elements(tuple(reinterpret_borrow<iterable>(src)), convert); | ||
} | ||
|
||
template <typename T> | ||
static handle cast(T &&src, return_value_policy policy, handle parent) { | ||
list l(src.size()); | ||
|
Uh oh!
There was an error while loading. Please reload this page.