-
Notifications
You must be signed in to change notification settings - Fork 198
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This commit documents free-threading in general and in the context of nanobind extensions.
- Loading branch information
Showing
3 changed files
with
301 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,289 @@ | ||
.. _free-threaded: | ||
|
||
.. cpp:namespace:: nanobind | ||
|
||
Free-threaded Python | ||
==================== | ||
|
||
**Free-threading** is an experimental new Python feature that replaces the | ||
`Global Interpreter Lock (GIL) | ||
<https://en.wikipedia.org/wiki/Global_interpreter_lock>`__ with a fine-grained | ||
locking scheme to better leverage multi-core parallelism. The resulting | ||
benefits do not come for free: extensions must explicitly opt-in and generally | ||
require careful modifications to ensure correctness. | ||
|
||
Nanobind can target free-threaded Python since version 2.2.0. This page | ||
explains how to do so and discusses a few caveats. Besides this page, make sure | ||
to review `py-free-threading.github.io <https://py-free-threading.github.io>`__ | ||
for a more comprehensive discussion of free-threaded Python. `PEP 703 | ||
<https://peps.python.org/pep-0703/>`__ explains the nitty gritty details. | ||
|
||
Opting in | ||
--------- | ||
|
||
To opt into free-threaded Python, pass the ``FREE_THREADED`` parameter to the | ||
:cmake:command:`nanobind_add_module()` CMake target command. For other build | ||
systems, refer to their respective documentation pages. | ||
|
||
.. code-block:: cmake | ||
nanobind_add_module( | ||
my_ext # Target name | ||
FREE_THREADED # Opt into free-threading | ||
my_ext.h # Source code files below | ||
my_ext.cpp) | ||
nanobind ignores the ``FREE_THREADED`` parameter when the registered Python | ||
version does not support free-threading. | ||
|
||
.. note:: | ||
|
||
**Stable ABI**: Note that there currently is no stable ABI for free-threaded | ||
Python, hence the ``STABLE_ABI`` parameter will be ignored in free-threaded | ||
extensions builds. It is valid to combine the ``STABLE_ABI`` and | ||
``FREE_THREADED`` arguments: the build system will choose between the two | ||
depending on the detected Python version. | ||
|
||
Warning: Loading a non-free threaded extension into a free-threaded Python | ||
build disables free-threading globally. | ||
|
||
If free-threading was requested and is available, the build system will set the | ||
``NB_FREE_THREADED`` preprocessor flag. This can be helpful to specialize | ||
binding code with ``#ifdef`` blocks, e.g.: | ||
|
||
.. code-block:: cpp | ||
#if !defined(NB_FREE_THREADED) | ||
... // simple GIL-protected code | ||
#else | ||
... // more complex thread-aware code | ||
#endif | ||
Caveats | ||
------- | ||
|
||
Free-threading can violate implicit assumptions made by extension developers | ||
when previously serial operations suddenly run concurrently, producing | ||
undefined behavior (race conditions, crashes, etc.). | ||
|
||
Let's consider a concrete example: the binding code below defines a ``Counter`` | ||
class with an increment operation. | ||
|
||
.. code-block:: cpp | ||
struct Counter { | ||
int value = 0; | ||
void inc() { value++; } | ||
}; | ||
nb::class_<Counter>(m, "Counter") | ||
.def("inc", &Counter::inc) | ||
.def_ro("value", &Counter::value); | ||
If multiple threads call the ``inc()`` method of a single ``Counter``, the | ||
final count will generally be incorrect, as the increment operation ``value++`` | ||
does not execute atomically. | ||
|
||
To fix this, we could modify the C++ type so that it protects its ``value`` | ||
member from concurrent modification, for example using an atomic number type | ||
(e.g., ``std::atomic<int>``) or a critical section (e.g., based on | ||
``std::mutex``). | ||
|
||
The race condition in the above example is relatively benign. However, | ||
in more complex projects, combinations of concurrency and unsafe memory | ||
accesses could introduce non-deterministic data corruption and crashes. | ||
|
||
Another common source of problems are *global variables* undergoing concurrent | ||
modification when no longer protected by the GIL. They will likewise require | ||
supplemental locking. The :ref:`next section <free-threaded-locks>` explains a | ||
Python-specific locking primitive that can be used in binding code besides | ||
the solutions mentioned above. | ||
|
||
.. _free-threaded-locks: | ||
|
||
Python locks | ||
------------ | ||
|
||
Nanobind provides convenience functionality encapsulating the mutex | ||
implementation that is part of Python ("``PyMutex``"). It is slightly more | ||
efficient than OS/language-provided synchronization primitives and generally | ||
preferable within Python extensions. | ||
|
||
The class :cpp:class:`ft_mutex` is analogous to ``std::mutex``, and | ||
:cpp:class:`ft_lock_guard` is analogous to ``std::lock_guard``. Note that they | ||
only exist to add *supplemental* critical sections needed in free-threaded | ||
Python, while becoming inactive (no-ops) when targeting regular GIL-protected | ||
Python. | ||
|
||
With these abstractions, the previous ``Counter`` implementation could be | ||
rewritten as: | ||
|
||
.. code-block:: cpp | ||
:emphasize-lines: 3,6 | ||
struct Counter { | ||
int value = 0; | ||
nb::ft_mutex mutex; | ||
void inc() { | ||
nb::ft_lock_guard guard(mutex); | ||
value++; | ||
} | ||
}; | ||
These locks are very compact (``sizeof(nb::ft_mutex) == 1``), though this is a | ||
Python implementation detail that could change in the future. | ||
|
||
.. _argument-locks: | ||
|
||
Argument locking | ||
---------------- | ||
|
||
Modifying class and function definitions as shown above may not always be | ||
possible. As an alternative, nanobind also provides a way to *retrofit* | ||
supplemental locking onto existing code. The idea is to lock individual | ||
arguments of a function *before* being allowed to invoke it. A built-in mutex | ||
present in every Python object enables this. | ||
|
||
To do so, call the :cpp:func:`.lock() <arg::lock>` member of | ||
:cpp:class:`nb::arg() <arg>` annotations to indicate that an | ||
argument must be locked, e.g.: | ||
|
||
- ``nb::arg("my_parameter").lock()`` | ||
- ``"my_parameter"_a.lock()`` (short-hand form) | ||
|
||
In methods bindings, pass :cpp:struct:`nb::lock_self() <lock_self>` to lock | ||
the implicit ``self`` argument. Note that at most 2 arguments can be | ||
locked per function, which is a limitation of the `Python locking API | ||
<https://docs.python.org/3.13/c-api/init.html#c.Py_BEGIN_CRITICAL_SECTION2>`__. | ||
|
||
The example below shows how this functionality can be used to protect ``inc()`` | ||
and a new ``merge()`` function that acquires two simultaneous locks. | ||
|
||
.. code-block:: cpp | ||
struct Counter { | ||
int value = 0; | ||
void inc() { value++; } | ||
void merge(Counter &other) { | ||
value += other.value; | ||
other.value = 0; | ||
} | ||
}; | ||
nb::class_<Counter>(m, "Counter") | ||
.def("inc", &Counter::inc, nb::lock_self()) | ||
.def("merge", &Counter::merge, nb::lock_self(), "other"_a.lock()) | ||
.def_ro("value", &Counter::value); | ||
The above solution has an obvious drawback: it only protects *bindings* (i.e., | ||
transitions from Python to C++). For example, if some other part of a C++ | ||
codebase calls ``merge()`` directly, the binding layer won't be involved, and | ||
no locking takes place. If such behavior can introduce race conditions, a | ||
larger-scale redesign of your project may be in order. | ||
|
||
.. note:: | ||
|
||
Adding locking annotations indiscriminately is inadvisable because they add | ||
a runtime cost to function call dispatcher. | ||
The :cpp:func:`.lock() <arg::lock>` annotation is ignored in GIL-protected | ||
builds. Note that listing arguments in function bindings generally comes at | ||
a small cost in terms of :ref:`binding overheads <binding-overheads>`. | ||
|
||
.. note:: | ||
|
||
**Python API and locking**: When the lock-protected function performs Python | ||
API calls (e.g., using :ref:`wrappers <wrappers>` like :cpp:class:`nb::dict | ||
<dict>`), Python may temporarily release locks to avoid deadlocks. Here, | ||
even basic reference counting such as a :cpp:class:`nb::object | ||
<object>` variable expiring at the end of a scope counts as an API call. | ||
|
||
These locks will be reacquired following the Python API call. This behavior | ||
resembles ordinary (GIL-protected) Python code, where operations like | ||
`Py_DECREF() | ||
<https://docs.python.org/3/c-api/refcounting.html#c.Py_DECREF>`__ can cause | ||
cause arbitrary Python code to execute. The semantics of this kind of | ||
relaxed critical section are described in the `Python documentation | ||
<https://docs.python.org/3.13/c-api/init.html#python-critical-section-api>`__. | ||
|
||
Miscellaneous notes | ||
------------------- | ||
|
||
API | ||
--- | ||
|
||
The following API specific to free-threading has been added: | ||
|
||
- :cpp:class:`nb::ft_mutex <ft_mutex>` | ||
- :cpp:class:`nb::ft_lock_guard <ft_lock_guard>` | ||
- :cpp:class:`nb::ft_object_guard <ft_object_guard>` | ||
- :cpp:class:`nb::ft_object2_guard <ft_object2_guard>` | ||
- :cpp:func:`nb::arg::lock() <arg::lock>` | ||
|
||
API stability | ||
_____________ | ||
|
||
The interface explained in this is excluded from the project's semantic | ||
versioning policy. Free-threading is still experimental, and API breaks may be | ||
necessary based on future experience and changes in Python itself. | ||
|
||
Wrappers | ||
________ | ||
|
||
:ref:`Wrapper types <wrappers>` like :cpp:class:`nb::list <list>` may used in | ||
multi-threaded code. Operations like :cpp:func:`nb::list::append() | ||
<list::append>` internally acquire locks and behave just like their ordinary | ||
Python counterparts. This means that race conditions can still occur without | ||
larger-scale synchronization, but such races won't jeopardize the memory safety | ||
of the program. | ||
|
||
GIL scope guards | ||
________________ | ||
|
||
Prior to free-threaded Python, the nanobind scope guards | ||
:cpp:struct:`gil_scoped_acquire` and :cpp:struct:`gil_scoped_release` would | ||
normally be used to acquire/release the GIL and enable parallel regions. | ||
|
||
These remain useful and should not be removed from existing code: while no | ||
longer blocking operations, they set and unset the current Python thread | ||
context and inform the garbage collector. | ||
|
||
The :cpp:struct:`gil_scoped_release` RAII scope guard class plays a special | ||
role in free-threaded builds, since it releases all :ref:`argument locks | ||
<argument-locks>` held by the current thread. | ||
|
||
Immortalization | ||
_______________ | ||
|
||
Python relies on a technique called *reference counting* to determine when an | ||
object is no longer needed. This approach can become a bottleneck in | ||
multi-threaded programs, since increasing and decreasing reference counts | ||
requires coordination among multiple processor cores. Python type and function | ||
objects are especially sensitive, since their reference counts change at a very | ||
high rate. | ||
|
||
Similar to free-threaded Python itself, nanobind avoids this bottleneck by | ||
*immortalizing* functions (``nanobind.nb_func``, ``nanobind.nb_method``) and | ||
type bindings. Immortal objects don't require reference counting. In turn, the | ||
downside is that they leak when the interpreter shuts down. Free-threaded | ||
nanobind extensions disable the internal :ref:`leak checker <leak-checker>`, | ||
since it would produce many warning messages caused by immortal objects. | ||
|
||
Internal data structures | ||
________________________ | ||
|
||
Nanobind maintains various internal data structures that store information | ||
about instances and function/type bindings. These data structures also play an | ||
important role to exchange type/instance data in larger projects that are split | ||
across several independent extension modules. | ||
|
||
The layout of these data structures differs between ordinary and free-threaded | ||
extensions, therefore nanobind isolates them from each other by assigning a | ||
different ABI version tag. This means that multi-module projects will need | ||
to consistently compile either free-threaded or non-free-threaded modules. | ||
|
||
Free-threaded nanobind uses thread-local and sharded data structures to avoid | ||
lock and atomic contention on the internal data structures, which would | ||
otherwise become a bottleneck in multi-threaded Python programs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters