[TOC]
These adapters make Protocol Buffer message types work with Pybind11 bindings.
To use the proto messages with pybind11:
- Include the header file
pybind11_protobuf/native_proto_caster.h
in the .cc file with your bindings. - Call
pybind11_protobuf::ImportNativeProtoCasters();
in yourPYBIND11_MODULE
definition. - Ensure
"@com_google_protobuf//:protobuf_python"
is a python dependency. When using Bazel, a common strategy is to add a python library that "wraps" the extension along with any required python dependencies.
Any arguments or return values which are a protocol buffer (including the base
class, proto2::Message
) will be automatically converted to python native
protocol buffers.
#include <pybind11/pybind11.h>
#include "path/to/my/my_message.proto.h"
#include "pybind11_protobuf/native_proto_caster.h"
// In real use, these two functions would probably be defined in a python-agnostic library.
MyMessage ReturnMyMessage() { ... }
void TakeMyMessage(const MyMessage& in) { ... }
PYBIND11_MODULE(my_module, m) {
pybind11_protobuf::ImportNativeProtoCasters();
m.def("return_my_message", &ReturnMyMessage);
m.def("take_my_message", &TakeMyMessage, pybind11::arg("in"));
}
When passing protos between C++ and Python, the native_proto_caster.h
bindings will convert protobuf objects to the native type on either side.
While C++ has only one native type, Python has two native types (https://rules-proto-grpc.com/en/latest/lang/python.html):
--define=use_fast_cpp_protos=false
(akause_pure_python_protos
)--define=use_fast_cpp_protos=true
With use_pure_python_protos
, protobuf objects passed between C++ and Python
are always serialized/deserialized between the native C++ type and the pure
Python native type. This is very safe but also slow.
With use_fast_cpp_protos
, the native Python type is internally backed by
the native C++ type, which unlocks various performance benefits, even when
only used from Python. When passing protobuf objects between Python and C++,
in certain situations the serialization/deserialization overhead can be
avoided, but not universally. Fundamentally, sharing C++ native protobuf
objects between C++ and Python is unsafe because C++ assumes that it has
exclusive ownership and may manipulate references in a way that undermines
Python's much safer ownership semantics. Because of this, sharing mutable
references or pointers between C++ and Python is not allowed.
However, when passing a Python protobuf object to
C++, and with PYBIND11_PROTOBUF_ASSUME_FULL_ABI_COMPATIBILITY
defined
(see proto_cast_util.h),
the bindings will share the underlying C++ native protobuf object with C++ when
passed by const &
or const *
.
When use_fast_cpp_protos
is in use, and
protobuf extensions
are involved, a well-known pitfall is that extensions are silently moved
to the proto2::UnknownFieldSet
when a message is deserialized in C++,
but the cc_proto_library
for the extensions is not linked in. The root
cause is an asymmetry in the handling of Python protos vs C++
protos:
when a Python proto is deserialized, both the Python descriptor pool and the
C++ descriptor pool are inspected, but when a C++ proto is deserialized, only
the C++ descriptor pool is inspected. Until this asymmetry is resolved, the
cc_proto_library
for all extensions involved must be added to the deps
of
the relevant pybind_library
or pybind_extension
, or if this is impractial,
pybind11_protobuf::check_unknown_fields::ExtensionsWithUnknownFieldsPolicy::WeakEnableFallbackToSerializeParse
or pybind11_protobuf::AllowUnknownFieldsFor
can be used.
The pitfall is sufficiently unobvious to be a setup for regular accidents, potentially with critical consequences.
To guard against the most common type of accident, native_proto_caster.h includes a safety mechanism that raises "Proto Message has an Unknown Field" in certain situations:
- When
use_fast_cpp_protos
is in use, - a protobuf message is returned from C++ to Python,
- the message involves protobuf extensions (recursively),
- and the
proto2::UnknownFieldSet
for the message or any of its submessages is not empty.
pybind11_protobuf::check_unknown_fields::ExtensionsWithUnknownFieldsPolicy::WeakEnableFallbackToSerializeParse
is a global escape hatch trading off convenience and runtime overhead: the
convenience is that it is not necessary to determine what cc_proto_library
dependencies need to be added, the runtime overhead is that
SerializePartialToString
/ParseFromString
is used for messages with unknown
fields, instead of the much faster CopyFrom
.
Another escape hatch is pybind11_protobuf::AllowUnknownFieldsFor
, which
simply disables the safety mechanism for specific message types, without
a runtime overhead. This is useful for situations in which unknown fields
are acceptable.
An example of a full error message generated by the safety mechanism (with lines breaks here for readability):
Proto Message of type pybind11.test.NestRepeated has an Unknown Field with
parent of type pybind11.test.BaseMessage: base_msgs.1003
(pybind11_protobuf/tests/extension_nest_repeated.proto,
pybind11_protobuf/tests/extension.proto).
Please add the required `cc_proto_library` `deps`.
Only if there is no alternative to suppressing this error, use
`pybind11_protobuf::AllowUnknownFieldsFor("pybind11.test.NestRepeated", "base_msgs");`
(Warning: suppressions may mask critical bugs.)
Note that the current implementation of the safety mechanism is a compromise solution, trading off simplicity of implementation, runtime performance, and precision. Alerting developers of new code to unknown fields is assumed to be generally helpful, but the unknown fields detection is limited to messages with extensions, to avoid the runtime overhead for the presumably much more common case that no extensions are involved. Because of this, the runtime overhead for the safety mechanism is expected to be very small.
Enumerations are passed and returned as integers. You may use the enum values
from the native python proto module to set and check the enum values used
by a bound proto enum (see tests/proto_enum_test.py
for an example).
In cases where a protocol buffer is used as an in/out parameter in C++, additional logic will be required in the wrapper. For example:
#include <pybind11/pybind11.h>
#include "path/to/my/my_message.proto.h"
#include "pybind11_protobuf/native_proto_caster.h"
void MutateMessage(MyMessage* in_out) { ... }
PYBIND11_MODULE(my_module, m) {
pybind11_protobuf::ImportNativeProtoCasters();
m.def("mutate_message", [](MyMessage in) {
MutateMessage(&in);
return in;
},
pybind11::arg("in"));
}
TL;DR: Ignore wrapped_proto_caster.h
if you can, this header was added as
a migration aid before the removal of proto_casters.h
.
Historic background: Due to the nature of pybind11, extension modules
built using native_proto_caster.h
could not interoperate with the
older proto_casters.h
bindings, as that would have led to C++ ODR
violations. wrapped_proto_caster.h
is a nearly-transparent wrapper for
native_proto_caster.h
to work around the ODR issue. With the migration to
native_proto_caster.h
now completed, wrapped_proto_caster.h
is obsolete
and will be removed in the future.