You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cuda_bindings/docs/source/environment_variables.rst
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,12 @@
4
4
Environment Variables
5
5
=====================
6
6
7
+
Runtime Environment Variables
8
+
-----------------------------
9
+
10
+
- ``CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM`` : When set to 1, the default stream is the per-thread default stream. When set to 0, the default stream is the legacy default stream. This defaults to 0, for the legacy default stream. See `Stream Synchronization Behavior <https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html>`_ for an explanation of the legacy and per-thread default streams.
- ``CUDA_PYTHON_PARALLEL_LEVEL`` (previously ``PARALLEL_LEVEL``) : int, sets the number of threads used in the compilation of extension modules. Not setting it or setting it to 0 would disable parallel builds.
15
21
16
-
Runtime Environment Variables
17
-
-----------------------------
18
-
19
-
- ``CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM`` : When set to 1, the default stream is the per-thread default stream. When set to 0, the default stream is the legacy default stream. This defaults to 0, for the legacy default stream. See `Stream Synchronization Behavior <https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html>`_ for an explanation of the legacy and per-thread default streams.
Copy file name to clipboardExpand all lines: cuda_bindings/docs/source/install.rst
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,11 +27,11 @@ Installing from PyPI
27
27
28
28
$ pip install -U cuda-python
29
29
30
-
Install all optional dependencies with::
30
+
Install all optional dependencies with:
31
31
32
32
.. code-block:: console
33
33
34
-
pip install -U cuda-python[all]
34
+
$ pip install -U cuda-python[all]
35
35
36
36
Where the optional dependencies include:
37
37
@@ -53,7 +53,7 @@ Installing from Conda
53
53
54
54
When using conda, the ``cuda-version`` metapackage can be used to control the versions of CUDA Toolkit components that are installed to the conda environment.
55
55
56
-
For example::
56
+
For example:
57
57
58
58
.. code-block:: console
59
59
@@ -72,7 +72,7 @@ Requirements
72
72
73
73
[^2]: The CUDA Runtime static library (``libcudart_static.a`` on Linux, ``cudart_static.lib`` on Windows) is part of the CUDA Toolkit. If using conda packages, it is contained in the ``cuda-cudart-static`` package.
74
74
75
-
Source builds require that the provided CUDA headers are of the same major.minor version as the ``cuda.bindings`` you're trying to build. Despite this requirement, note that the minor version compatibility is still maintained. Use the ``CUDA_HOME`` (or ``CUDA_PATH``) environment variable to specify the location of your headers. For example, if your headers are located in ``/usr/local/cuda/include``, then you should set ``CUDA_HOME`` with::
75
+
Source builds require that the provided CUDA headers are of the same major.minor version as the ``cuda.bindings`` you're trying to build. Despite this requirement, note that the minor version compatibility is still maintained. Use the ``CUDA_HOME`` (or ``CUDA_PATH``) environment variable to specify the location of your headers. For example, if your headers are located in ``/usr/local/cuda/include``, then you should set ``CUDA_HOME`` with:
76
76
77
77
.. code-block:: console
78
78
@@ -87,7 +87,7 @@ See `Environment Variables <environment_variables.rst>`_ for a description of ot
The following command was used to profile the applications::
280
+
The following command was used to profile the applications:
281
281
282
282
.. code-block:: shell
283
283
@@ -323,7 +323,7 @@ Using NumPy
323
323
324
324
NumPy `Array objects <https://numpy.org/doc/stable/reference/arrays.html>`_ can be used to fulfill each of these conditions directly.
325
325
326
-
Let's use the following kernel definition as an example::
326
+
Let's use the following kernel definition as an example:
327
327
328
328
.. code-block:: python
329
329
@@ -404,7 +404,7 @@ This example uses the following types:
404
404
405
405
Note how all three pointers are ``np.intp`` since the pointer values are always a representation of an address space.
406
406
407
-
Putting it all together::
407
+
Putting it all together:
408
408
409
409
.. code-block:: python
410
410
@@ -429,13 +429,13 @@ Putting it all together::
429
429
The final step is to construct a ``kernelParams`` argument that fulfills all of the launch API conditions. This is made easy because each array object comes
430
430
with a `ctypes <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.ctypes.html#numpy.ndarray.ctypes>`_ data attribute that returns the underlying ``void*`` pointer value.
431
431
432
-
By having the final array object contain all pointers, we fulfill the contiguous array requirement::
432
+
By having the final array object contain all pointers, we fulfill the contiguous array requirement:
433
433
434
434
.. code-block:: python
435
435
436
436
kernelParams = np.array([arg.ctypes.data for arg in kernelValues], dtype=np.intp)
437
437
438
-
The launch API supports `Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_ objects, therefore we can pass the array object directly.::
438
+
The launch API supports `Buffer Protocol <https://docs.python.org/3/c-api/buffer.html>`_ objects, therefore we can pass the array object directly:
439
439
440
440
.. code-block:: python
441
441
@@ -463,7 +463,7 @@ The ctypes approach treats the ``kernelParams`` argument as a pair of two tuples
463
463
The ctypes `fundamental data types <https://docs.python.org/3/library/ctypes.html#fundamental-data-types>`_ documentation describes the compatibility between different Python types and C types.
464
464
Furthermore, `custom data types <https://docs.python.org/3/library/ctypes.html#calling-functions-with-your-own-custom-data-types>`_ can be used to support kernels with custom types.
465
465
466
-
For this example the result becomes::
466
+
For this example the result becomes:
467
467
468
468
.. code-block:: python
469
469
@@ -502,7 +502,7 @@ Values that are set to ``None`` have a special meaning:
502
502
503
503
In all three cases, the API call will fetch the underlying pointer value and construct a contiguous array with other kernel parameters.
504
504
505
-
With the setup complete, the kernel can be launched::
505
+
With the setup complete, the kernel can be launched:
506
506
507
507
.. code-block:: python
508
508
@@ -520,7 +520,7 @@ CUDA objects
520
520
521
521
Certain CUDA kernels use native CUDA types as their parameters such as ``cudaTextureObject_t``. These types require special handling since they're neither a primitive ctype nor a custom user type. Since ``cuda.bindings`` exposes each of them as Python classes, they each implement ``getPtr()`` and ``__int__()``. These two callables used to support the NumPy and ctypes approach. The difference between each call is further described under `Tips and Tricks <https://nvidia.github.io/cuda-python/cuda-bindings/latest/tips_and_tricks.html#>`_.
522
522
523
-
For this example, lets use the ``transformKernel`` from `examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_::
523
+
For this example, lets use the ``transformKernel`` from `examples/0_Introduction/simpleCubemapTexture_test.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_bindings/examples/0_Introduction/simpleCubemapTexture_test.py>`_:
524
524
525
525
.. code-block:: python
526
526
@@ -539,7 +539,7 @@ For this example, lets use the ``transformKernel`` from `examples/0_Introduction
For NumPy, we can convert these CUDA types by leveraging the ``__int__()`` call to fetch the address of the underlying ``cudaTextureObject_t`` C object and wrapping it in a NumPy object array of type ``np.intp``::
542
+
For NumPy, we can convert these CUDA types by leveraging the ``__int__()`` call to fetch the address of the underlying ``cudaTextureObject_t`` C object and wrapping it in a NumPy object array of type ``np.intp``:
543
543
544
544
.. code-block:: python
545
545
@@ -550,7 +550,7 @@ For NumPy, we can convert these CUDA types by leveraging the ``__int__()`` call
550
550
)
551
551
kernelArgs = np.array([arg.ctypes.data for arg in kernelValues], dtype=np.intp)
552
552
553
-
For ctypes, we leverage the special handling of ``None`` type since each Python class already implements ``getPtr()``::
553
+
For ctypes, we leverage the special handling of ``None`` type since each Python class already implements ``getPtr()``:
All CUDA C types are exposed to Python as Python classes. For example, the :class:`~cuda.bindings.driver.CUstream` type is exposed as a class with methods :meth:`~cuda.bindings.driver.CUstream.getPtr()` and :meth:`~cuda.bindings.driver.CUstream.__int__()` implemented.
11
-
12
-
There is an important distinction between the ``getPtr()`` method and the behaviour of ``__int__()``. Since a ``CUstream`` is itself just a pointer, calling ``instance_of_CUstream.getPtr()`` returns the pointer *to* the pointer, instead of the value of the ``CUstream`` C object that is the pointer to the underlying stream handle. ``int(instance_of_CUstream)`` returns the value of the ``CUstream`` converted to a Python int and is the actual address of the underlying handle.
13
-
14
10
.. warning::
15
11
16
12
Using ``int(cuda_obj)`` to retrieve the underlying address of a CUDA object is deprecated and
17
13
subject to future removal. Please switch to use :func:`~cuda.bindings.utils.get_cuda_native_handle`
18
14
instead.
19
15
16
+
All CUDA C types are exposed to Python as Python classes. For example, the :class:`~cuda.bindings.driver.CUstream` type is exposed as a class with methods :meth:`~cuda.bindings.driver.CUstream.getPtr()` and :meth:`~cuda.bindings.driver.CUstream.__int__()` implemented.
17
+
18
+
There is an important distinction between the ``getPtr()`` method and the behaviour of ``__int__()``. Since a ``CUstream`` is itself just a pointer, calling ``instance_of_CUstream.getPtr()`` returns the pointer *to* the pointer, instead of the value of the ``CUstream`` C object that is the pointer to the underlying stream handle. ``int(instance_of_CUstream)`` returns the value of the ``CUstream`` converted to a Python int and is the actual address of the underlying handle.
0 commit comments