Skip to content

Commit

Permalink
Deploy Sphinx documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Feb 9, 2024
0 parents commit 9c5c9a3
Show file tree
Hide file tree
Showing 69 changed files with 29,361 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: fd099f8b4928d9493ca64793d2d5e7fd
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file added .nojekyll
Empty file.
Binary file added _images/compiler_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/header.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/joint_transforms.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
214 changes: 214 additions & 0 deletions _sources/basics.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
Basics
======

Initialization
--------------

Before use Warp should be explicitly initialized with the ``wp.init()`` method as follows::

import warp as wp

wp.init()

Warp will print some startup information about the compute devices available, driver versions, and the location
for any generated kernel code, e.g.:

.. code:: bat
Warp 1.0.0 initialized:
CUDA Toolkit: 11.8, Driver: 12.1
Devices:
"cpu" | AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
"cuda:0" | NVIDIA GeForce RTX 4080 (sm_89)
Kernel cache: C:\Users\mmacklin\AppData\Local\NVIDIA\warp\Cache\1.0.0
Kernels
-------

In Warp, compute kernels are defined as Python functions and annotated with the ``@wp.kernel`` decorator, as follows::

@wp.kernel
def simple_kernel(a: wp.array(dtype=wp.vec3),
b: wp.array(dtype=wp.vec3),
c: wp.array(dtype=float)):

# get thread index
tid = wp.tid()

# load two vec3s
x = a[tid]
y = b[tid]

# compute the dot product between vectors
r = wp.dot(x, y)

# write result back to memory
c[tid] = r

Because Warp kernels are compiled to native C++/CUDA code, all the function input arguments must be statically typed. This allows
Warp to generate fast code that executes at essentially native speeds. Because kernels may be run on either the CPU
or GPU, they cannot access arbitrary global state from the Python environment. Instead they must read and write data
through their input parameters such as arrays.

Warp kernels functions have a 1:1 correspondence with CUDA kernels, to launch a kernel with 1024 threads, we use
:func:`wp.launch() <warp.launch>` as follows::

wp.launch(kernel=simple_kernel, # kernel to launch
dim=1024, # number of threads
inputs=[a, b, c], # parameters
device="cuda") # execution device

Inside the kernel we can retrieve the *thread index* of the each thread using the ``wp.tid()`` builtin function::

# get thread index
i = wp.tid()

Kernels can be launched with 1D, 2D, 3D, or 4D grids of threads, e.g.: to launch a 2D grid of threads to process a 1024x1024 image we could write::

wp.launch(kernel=compute_image,
dim=(1024, 1024),
inputs=[img],
device="cuda")

Then, inside the kernel we can retrieve a 2D thread index as follows::

# get thread index
i, j = wp.tid()

# write out a color value for each pixel
color[i, j] = wp.vec3(r, g, b)

.. _example-cache-management:

Example: Changing the kernel cache directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following example illustrates how the location for generated and compiled
kernel code can be changed before and after calling ``wp.init()``.

.. code:: python
import os
import warp as wp
example_dir = os.path.dirname(os.path.realpath(__file__))
# set default cache directory before wp.init()
wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")
wp.init()
print("+++ Current cache directory: ", wp.config.kernel_cache_dir)
# change cache directory after wp.init()
wp.build.init_kernel_cache(os.path.join(example_dir, "tmp", "warpcache2"))
print("+++ Current cache directory: ", wp.config.kernel_cache_dir)
# clear kernel cache (forces fresh kernel builds every time)
wp.build.clear_kernel_cache()
@wp.kernel
def basic(x: wp.array(dtype=float)):
tid = wp.tid()
x[tid] = float(tid)
device = "cpu"
n = 10
x = wp.zeros(n, dtype=float, device=device)
wp.launch(kernel=basic, dim=n, inputs=[x], device=device)
print(x.numpy())
Arrays
------

Memory allocations are exposed via the ``wp.array`` type. Arrays wrap an underlying memory allocation that may live in
either host (CPU), or device (GPU) memory. Arrays are strongly typed and store a linear sequence of built-in values
(``float,``, ``int``, ``vec3``, ``matrix33``, etc).

Arrays can be allocated similar to PyTorch::

# allocate an uninitialized array of vec3s
v = wp.empty(shape=n, dtype=wp.vec3, device="cuda")

# allocate a zero-initialized array of quaternions
q = wp.zeros(shape=n, dtype=wp.quat, device="cuda")

# allocate and initialize an array from a NumPy array
# will be automatically transferred to the specified device
a = np.ones((10, 3), dtype=np.float32)
v = wp.from_numpy(a, dtype=wp.vec3, device="cuda")

By default, Warp arrays that are initialized from external data (e.g.: NumPy, Lists, Tuples) will create a copy the data to new memory for the
device specified. However, it is possible for arrays to alias external memory using the ``copy=False`` parameter to the
array constructor provided the input is contiguous and on the same device. See the :doc:`/modules/interoperability`
section for more details on sharing memory with external frameworks.

To read GPU array data back to CPU memory we can use the ``array.numpy()`` method::

# bring data from device back to host
view = device_array.numpy()

This will automatically synchronize with the GPU to ensure that any outstanding work has finished, and will
copy the array back to CPU memory where it is passed to NumPy. Calling ``array.numpy()`` on a CPU array will return
a zero-copy NumPy view onto the Warp data.

User Functions
--------------

Users can write their own functions using the ``@wp.func`` decorator, for example::

@wp.func
def square(x: float):
return x*x

User functions can be called freely from within kernels inside the same module and accept arrays as inputs.

Compilation Model
-----------------

Warp uses a Python->C++/CUDA compilation model that generates kernel code from Python function definitions. All kernels belonging to a Python module are runtime compiled into dynamic libraries and PTX, the result is then cached between application restarts for fast startup times.

Note that compilation is triggered on the first kernel launch for that module. Any kernels registered in the module with ``@wp.kernel`` will be included in the shared library.

.. image:: ./img/compiler_pipeline.png


Language Details
----------------

To support GPU computation and differentiability, there are some differences from the CPython runtime.

Built-in Types
^^^^^^^^^^^^^^

Warp supports a number of built-in math types similar to high-level shading languages, for example ``vec2, vec3, vec4, mat22, mat33, mat44, quat, array``. All built-in types have value semantics so that expressions such as ``a = b`` generate a copy of the variable b rather than a reference.

Strong Typing
^^^^^^^^^^^^^

Unlike Python, in Warp all variables must be typed. Types are inferred from source expressions and function signatures using the Python typing extensions. All kernel parameters must be annotated with the appropriate type, for example: ::

@wp.kernel
def simple_kernel(a: wp.array(dtype=vec3),
b: wp.array(dtype=vec3),
c: float):

Tuple initialization is not supported, instead variables should be explicitly typed: ::

# invalid
a = (1.0, 2.0, 3.0)

# valid
a = wp.vec3(1.0, 2.0, 3.0)


Limitations and Unsupported Features
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See :doc:`limitations` for a list of Warp limitations and unsupported features.
138 changes: 138 additions & 0 deletions _sources/configuration.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
Runtime Settings
================

Warp has settings at the global, module, and kernel level that can be used to fine-tune the compilation and verbosity
of Warp programs. In cases in which a setting can be changed at multiple levels (e.g ``enable_backward``),
the setting at the more-specific scope takes precedence.

Global Settings
---------------

To change a setting, prepend ``wp.config.`` to the name of the variable and assign a value to it.
Some settings may be changed on the fly, while others need to be set prior to calling ``wp.init()`` to take effect.

For example, the location of the user kernel cache can be changed with:

.. code-block:: python
import os
import warp as wp
example_dir = os.path.dirname(os.path.realpath(__file__))
# set default cache directory before wp.init()
wp.config.kernel_cache_dir = os.path.join(example_dir, "tmp", "warpcache1")
wp.init()
Basic Global Settings
^^^^^^^^^^^^^^^^^^^^^

+--------------------+---------+-------------+--------------------------------------------------------------------------+
| Field | Type |Default Value| Description |
+====================+=========+=============+==========================================================================+
|``verify_fp`` | Boolean | ``False`` | If ``True``, Warp will check that inputs and outputs are finite before |
| | | | and/or after various operations. **Has performance implications.** |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``verify_cuda`` | Boolean | ``False`` | If ``True``, Warp will check for CUDA errors after every launch and |
| | | | memory operation. CUDA error verification cannot be used during graph |
| | | | capture. **Has performance implications.** |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``print_launches`` | Boolean | ``False`` | If ``True``, Warp will print details of every kernel launch to standard |
| | | | out (e.g. launch dimensions, inputs, outputs, device, etc.). |
| | | | **Has performance implications.** |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``mode`` | String |``"release"``| Controls whether to compile Warp kernels in debug or release mode. |
| | | | Valid choices are ``"release"`` or ``"debug"``. |
| | | | **Has performance implications.** |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``verbose`` | Boolean | ``False`` | If ``True``, additional information will be printed to standard out |
| | | | during code generation, compilation, etc. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``quiet`` | Boolean | ``False`` | If ``True``, Warp module initialization messages will be disabled. |
| | | | This setting does not affect error messages and warnings. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``kernel_cache_dir``| String | ``None`` | The path to the directory used for the user kernel cache. Subdirectories |
| | | | named ``gen`` and ``bin`` will be created in this directory. If ``None``,|
| | | | a directory will be automatically determined using |
| | | | `appdirs.user_cache_directory <https://github.com/ActiveState/appdirs>`_ |
| | | | |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``enable_backward`` | Boolean | ``True`` | If ``True``, backward passes of kernels will be compiled by default. |
| | | | Disabling this setting can reduce kernel compilation times. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+

Advanced Global Settings
^^^^^^^^^^^^^^^^^^^^^^^^

+--------------------+---------+-------------+--------------------------------------------------------------------------+
| Field | Type |Default Value| Description |
+====================+=========+=============+==========================================================================+
|``cache_kernels`` | Boolean | ``True`` | If ``True``, kernels that have already been compiled from previous |
| | | | application launches will not be recompiled. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``cuda_output`` | String | ``None`` | The preferred CUDA output format for kernels. Valid choices are ``None``,|
| | | | ``"ptx"``, and ``"cubin"``. If ``None``, a format will be determined |
| | | | automatically. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``ptx_target_arch`` | Integer | 70 | The target architecture for PTX generation. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``llvm_cuda`` | Boolean | ``False`` | If ``True``, Clang/LLVM will be used to compile CUDA code instead of |
| | | | NVTRC. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+

Module Settings
---------------

Module-level settings to control runtime compilation and code generation may be changed by passing a dictionary of
option pairs to ``wp.set_module_options()``.

For example, compilation of backward passes for the kernel in an entire module can be disabled with:

.. code:: python
wp.set_module_options({"enable_backward": False})
The options for a module can also be queried using ``wp.get_module_options()``.

+--------------------+---------+-------------+--------------------------------------------------------------------------+
| Field | Type |Default Value| Description |
+====================+=========+=============+==========================================================================+
|``mode`` | String | Global | Controls whether to compile the module's kernels in debug or release |
| | | setting | mode by default. Valid choices are ``"release"`` or ``"debug"``. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``max_unroll`` | Integer | 16 | The maximum fixed-size loop to unroll. Note that ``max_unroll`` does not |
| | | | consider the total number of iterations in nested loops. This can result |
| | | | in a large amount of automatically generated code if each nested loop is |
| | | | below the ``max_unroll`` threshold. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``enable_backward`` | Boolean | Global | If ``True``, backward passes of kernels will be compiled by default. |
| | | setting | Valid choices are ``"release"`` or ``"debug"``. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``fast_math`` | Boolean | ``False`` | If ``True``, CUDA kernels will be compiled with the ``--use_fast_math`` |
| | | | compiler option, which enables some fast math operations that are faster |
| | | | but less accurate. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+
|``cuda_output`` | String | ``None`` | The preferred CUDA output format for kernels. Valid choices are ``None``,|
| | | | ``"ptx"``, and ``"cubin"``. If ``None``, a format will be determined |
| | | | automatically. The module-level setting takes precedence over the global |
| | | | setting. |
+--------------------+---------+-------------+--------------------------------------------------------------------------+

Kernel Settings
---------------

``enable_backward`` is currently the only setting that can also be configured on a per-kernel level.
Backward-pass compilation can be disabled by passing an argument into the ``@wp.kernel`` decorator
as in the following example:

.. code-block:: python
@wp.kernel(enable_backward=False)
def scale_2(
x: wp.array(dtype=float),
y: wp.array(dtype=float),
):
y[0] = x[0] ** 2.0
Loading

0 comments on commit 9c5c9a3

Please sign in to comment.