diff --git a/docs/changelog.rst b/docs/changelog.rst index 8d41ce2f..67d4672f 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -18,8 +18,14 @@ below inherit that of the preceding release. Version 2.2.0 (TBA) ------------------- -- The NVIDIA CUDA compiler (``nvcc``) is now explicitly supported and included - in nanobind's CI test suite. +- nanobind can now target `free-threaded Python + `__, which replaces the `Global + Interpreter Lock (GIL) + `__ with a + fine-grained locking scheme (see `PEP 703 + `__) to better leverage multi-core + parallelism. A `separate documation page `__ explains this in + detail. - nanobind has always used `PEP 590 vector calls `__ to efficiently dispatch calls @@ -41,6 +47,9 @@ Version 2.2.0 (TBA) with :cpp:class:`nb::is_arithmetic() ` creates enumerations deriving from :py:class:`enum.IntFlag`. +- The NVIDIA CUDA compiler (``nvcc``) is now explicitly supported and included + in nanobind's CI test suite. + * Added support for return value policy customization to the type casters of ``Eigen::Ref<...>`` and ``Eigen::Map<...>`` (commit `67316e `__). diff --git a/docs/free_threaded.rst b/docs/free_threaded.rst new file mode 100644 index 00000000..35699040 --- /dev/null +++ b/docs/free_threaded.rst @@ -0,0 +1,289 @@ +.. _free-threaded: + +.. cpp:namespace:: nanobind + +Free-threaded Python +==================== + +**Free-threading** is an experimental new Python feature that replaces the +`Global Interpreter Lock (GIL) +`__ with a fine-grained +locking scheme to better leverage multi-core parallelism. The resulting +benefits do not come for free: extensions must explicitly opt-in and generally +require careful modifications to ensure correctness. + +Nanobind can target free-threaded Python since version 2.2.0. This page +explains how to do so and discusses a few caveats. Besides this page, make sure +to review `py-free-threading.github.io `__ +for a more comprehensive discussion of free-threaded Python. `PEP 703 +`__ explains the nitty gritty details. + +Opting in +--------- + +To opt into free-threaded Python, pass the ``FREE_THREADED`` parameter to the +:cmake:command:`nanobind_add_module()` CMake target command. For other build +systems, refer to their respective documentation pages. + +.. code-block:: cmake + + nanobind_add_module( + my_ext # Target name + FREE_THREADED # Opt into free-threading + my_ext.h # Source code files below + my_ext.cpp) + +nanobind ignores the ``FREE_THREADED`` parameter when the registered Python +version does not support free-threading. + +.. note:: + + **Stable ABI**: Note that there currently is no stable ABI for free-threaded + Python, hence the ``STABLE_ABI`` parameter will be ignored in free-threaded + extensions builds. It is valid to combine the ``STABLE_ABI`` and + ``FREE_THREADED`` arguments: the build system will choose between the two + depending on the detected Python version. + + Warning: Loading a non-free threaded extension into a free-threaded Python + build disables free-threading globally. + +If free-threading was requested and is available, the build system will set the +``NB_FREE_THREADED`` preprocessor flag. This can be helpful to specialize +binding code with ``#ifdef`` blocks, e.g.: + +.. code-block:: cpp + + #if !defined(NB_FREE_THREADED) + ... // simple GIL-protected code + #else + ... // more complex thread-aware code + #endif + +Caveats +------- + +Free-threading can violate implicit assumptions made by extension developers +when previously serial operations suddenly run concurrently, producing +undefined behavior (race conditions, crashes, etc.). + +Let's consider a concrete example: the binding code below defines a ``Counter`` +class with an increment operation. + +.. code-block:: cpp + + struct Counter { + int value = 0; + void inc() { value++; } + }; + + nb::class_(m, "Counter") + .def("inc", &Counter::inc) + .def_ro("value", &Counter::value); + + +If multiple threads call the ``inc()`` method of a single ``Counter``, the +final count will generally be incorrect, as the increment operation ``value++`` +does not execute atomically. + +To fix this, we could modify the C++ type so that it protects its ``value`` +member from concurrent modification, for example using an atomic number type +(e.g., ``std::atomic``) or a critical section (e.g., based on +``std::mutex``). + +The race condition in the above example is relatively benign. However, +in more complex projects, combinations of concurrency and unsafe memory +accesses could introduce non-deterministic data corruption and crashes. + +Another common source of problems are *global variables* undergoing concurrent +modification when no longer protected by the GIL. They will likewise require +supplemental locking. The :ref:`next section ` explains a +Python-specific locking primitive that can be used in binding code besides +the solutions mentioned above. + +.. _free-threaded-locks: + +Python locks +------------ + +Nanobind provides convenience functionality encapsulating the mutex +implementation that is part of Python ("``PyMutex``"). It is slightly more +efficient than OS/language-provided synchronization primitives and generally +preferable within Python extensions. + +The class :cpp:class:`ft_mutex` is analogous to ``std::mutex``, and +:cpp:class:`ft_lock_guard` is analogous to ``std::lock_guard``. Note that they +only exist to add *supplemental* critical sections needed in free-threaded +Python, while becoming inactive (no-ops) when targeting regular GIL-protected +Python. + +With these abstractions, the previous ``Counter`` implementation could be +rewritten as: + +.. code-block:: cpp + :emphasize-lines: 3,6 + + struct Counter { + int value = 0; + nb::ft_mutex mutex; + + void inc() { + nb::ft_lock_guard guard(mutex); + value++; + } + }; + +These locks are very compact (``sizeof(nb::ft_mutex) == 1``), though this is a +Python implementation detail that could change in the future. + +.. _argument-locks: + +Argument locking +---------------- + +Modifying class and function definitions as shown above may not always be +possible. As an alternative, nanobind also provides a way to *retrofit* +supplemental locking onto existing code. The idea is to lock individual +arguments of a function *before* being allowed to invoke it. A built-in mutex +present in every Python object enables this. + +To do so, call the :cpp:func:`.lock() ` member of +:cpp:class:`nb::arg() ` annotations to indicate that an +argument must be locked, e.g.: + +- ``nb::arg("my_parameter").lock()`` +- ``"my_parameter"_a.lock()`` (short-hand form) + +In methods bindings, pass :cpp:struct:`nb::lock_self() ` to lock +the implicit ``self`` argument. Note that at most 2 arguments can be +locked per function, which is a limitation of the `Python locking API +`__. + +The example below shows how this functionality can be used to protect ``inc()`` +and a new ``merge()`` function that acquires two simultaneous locks. + +.. code-block:: cpp + + struct Counter { + int value = 0; + + void inc() { value++; } + void merge(Counter &other) { + value += other.value; + other.value = 0; + } + }; + + nb::class_(m, "Counter") + .def("inc", &Counter::inc, nb::lock_self()) + .def("merge", &Counter::merge, nb::lock_self(), "other"_a.lock()) + .def_ro("value", &Counter::value); + +The above solution has an obvious drawback: it only protects *bindings* (i.e., +transitions from Python to C++). For example, if some other part of a C++ +codebase calls ``merge()`` directly, the binding layer won't be involved, and +no locking takes place. If such behavior can introduce race conditions, a +larger-scale redesign of your project may be in order. + +.. note:: + + Adding locking annotations indiscriminately is inadvisable because they add + a runtime cost to function call dispatcher. + The :cpp:func:`.lock() ` annotation is ignored in GIL-protected + builds. Note that listing arguments in function bindings generally comes at + a small cost in terms of :ref:`binding overheads `. + +.. note:: + + **Python API and locking**: When the lock-protected function performs Python + API calls (e.g., using :ref:`wrappers ` like :cpp:class:`nb::dict + `), Python may temporarily release locks to avoid deadlocks. Here, + even basic reference counting such as a :cpp:class:`nb::object + ` variable expiring at the end of a scope counts as an API call. + + These locks will be reacquired following the Python API call. This behavior + resembles ordinary (GIL-protected) Python code, where operations like + `Py_DECREF() + `__ can cause + cause arbitrary Python code to execute. The semantics of this kind of + relaxed critical section are described in the `Python documentation + `__. + +Miscellaneous notes +------------------- + +API +--- + +The following API specific to free-threading has been added: + +- :cpp:class:`nb::ft_mutex ` +- :cpp:class:`nb::ft_lock_guard ` +- :cpp:class:`nb::ft_object_guard ` +- :cpp:class:`nb::ft_object2_guard ` +- :cpp:func:`nb::arg::lock() ` + +API stability +_____________ + +The interface explained in this is excluded from the project's semantic +versioning policy. Free-threading is still experimental, and API breaks may be +necessary based on future experience and changes in Python itself. + +Wrappers +________ + +:ref:`Wrapper types ` like :cpp:class:`nb::list ` may used in +multi-threaded code. Operations like :cpp:func:`nb::list::append() +` internally acquire locks and behave just like their ordinary +Python counterparts. This means that race conditions can still occur without +larger-scale synchronization, but such races won't jeopardize the memory safety +of the program. + +GIL scope guards +________________ + +Prior to free-threaded Python, the nanobind scope guards +:cpp:struct:`gil_scoped_acquire` and :cpp:struct:`gil_scoped_release` would +normally be used to acquire/release the GIL and enable parallel regions. + +These remain useful and should not be removed from existing code: while no +longer blocking operations, they set and unset the current Python thread +context and inform the garbage collector. + +The :cpp:struct:`gil_scoped_release` RAII scope guard class plays a special +role in free-threaded builds, since it releases all :ref:`argument locks +` held by the current thread. + +Immortalization +_______________ + +Python relies on a technique called *reference counting* to determine when an +object is no longer needed. This approach can become a bottleneck in +multi-threaded programs, since increasing and decreasing reference counts +requires coordination among multiple processor cores. Python type and function +objects are especially sensitive, since their reference counts change at a very +high rate. + +Similar to free-threaded Python itself, nanobind avoids this bottleneck by +*immortalizing* functions (``nanobind.nb_func``, ``nanobind.nb_method``) and +type bindings. Immortal objects don't require reference counting. In turn, the +downside is that they leak when the interpreter shuts down. Free-threaded +nanobind extensions disable the internal :ref:`leak checker `, +since it would produce many warning messages caused by immortal objects. + +Internal data structures +________________________ + +Nanobind maintains various internal data structures that store information +about instances and function/type bindings. These data structures also play an +important role to exchange type/instance data in larger projects that are split +across several independent extension modules. + +The layout of these data structures differs between ordinary and free-threaded +extensions, therefore nanobind isolates them from each other by assigning a +different ABI version tag. This means that multi-module projects will need +to consistently compile either free-threaded or non-free-threaded modules. + +Free-threaded nanobind uses thread-local and sharded data structures to avoid +lock and atomic contention on the internal data structures, which would +otherwise become a bottleneck in multi-threaded Python programs. diff --git a/docs/index.rst b/docs/index.rst index fdc7048d..fdc013f1 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -132,6 +132,7 @@ The nanobind logo was designed by `AndoTwin Studio :caption: Advanced :maxdepth: 1 + free_threaded ownership_adv lowlevel typeslots