Skip to content

Commit

Permalink
Updating docs and other files for release v0.6.0
Browse files Browse the repository at this point in the history
  • Loading branch information
cliffburdick committed Oct 2, 2023
1 parent 539c1b7 commit 7b69822
Show file tree
Hide file tree
Showing 259 changed files with 13,584 additions and 4,801 deletions.
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ authors:
given-names: "Adam"
orcid: "https://orcid.org/0000-0001-9690-6357"
title: "MatX Primitives Library for GPU-Accelerated Numerical Computing in C++"
version: 0.1.0
date-released: 2021-10-26
version: 0.6.0
date-released: 2023-10-02
url: "https://github.com/NVIDIA/matx"
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ endif()
project(MATX
LANGUAGES CUDA CXX
DESCRIPTION "A modern and efficient header-only C++ library for numerical computing on GPU"
VERSION 0.5.0
VERSION 0.6.0
HOMEPAGE_URL "https://github.com/NVIDIA/MatX")

if (NOT CMAKE_CUDA_ARCHITECTURES)
Expand Down
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,17 @@ We provide a variety of training materials and examples to quickly learn the Mat
- Finally, for new MatX developers, browsing the [example applications](examples) can provide familarity with the API and best practices.

## Release Major Features
*v0.6.0*:
- Breaking changes
* This marks the first release of using "transforms as operators". This allows transforms to be used in any operator expression, whereas the previous release required them to be on separate lines. For an example, please see: https://nvidia.github.io/MatX/basics/fusion.html. This also causes a breaking change with transform usage. Converting to the new format is as simple as moving the function parameters. For example: `matmul(C, A, B, stream);` becomes `(C = matmul(A,B)).run(stream);`.
- Features
* Polyphase channelizer
* Many new operators, including upsample, downsample, pwelch, overlap, at, etc
* Added more lvalue semantics for operators based on view manipulation
- Bug fixes
* Fixed cache issues
* Fixed stride = 0 in matmul

*v0.5.0*:
* Polyphase resampler
* Documentation overhaul with examples for each function
Expand All @@ -205,15 +216,6 @@ We provide a variety of training materials and examples to quickly learn the Mat
* 16-bit float reductions
* Output iterator support in CUB

*v0.3.0*:
* Many new operators, including `flatten`, `remap`, `lcollapse`. `rcollapse`, `fmod`, `clone`, `slice`
* Extended N-D tensor support to more functions
* Allow operators on reduction inputs
* g++11 support
* NVTX support
* Many, many bug fixes


## Discussions
We have an open discussions board [here](https://github.com/NVIDIA/MatX/discussions). We encourage any questions about the library to be posted here for other users to learn from and read through.

Expand Down
3 changes: 0 additions & 3 deletions docs/_sources/api/creation/tensors/make.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,11 @@ Return by Value
.. doxygenfunction:: make_tensor( TensorType &tensor, const index_t (&shape)[TensorType::Rank()], matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor( ShapeType &&shape, matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor( TensorType &tensor, ShapeType &&shape, matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor( matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor( TensorType &tensor, matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor( T *data, const index_t (&shape)[RANK], bool owning = false)
.. doxygenfunction:: make_tensor( TensorType &tensor, typename TensorType::scalar_type *data, const index_t (&shape)[TensorType::Rank()], bool owning = false)
.. doxygenfunction:: make_tensor( T *data, ShapeType &&shape, bool owning = false)
.. doxygenfunction:: make_tensor( TensorType &tensor, typename TensorType::scalar_type *data, typename TensorType::shape_container &&shape, bool owning = false)
.. doxygenfunction:: make_tensor( T *ptr, bool owning = false)
.. doxygenfunction:: make_tensor( TensorType &tensor, typename TensorType::scalar_type *ptr, bool owning = false)
.. doxygenfunction:: make_tensor( Storage &&s, ShapeType &&shape)
.. doxygenfunction:: make_tensor( TensorType &tensor, typename TensorType::storage_type &&s, typename TensorType::shape_container &&shape)
Expand All @@ -38,5 +36,4 @@ Return by Pointer
.. doxygenfunction:: make_tensor_p( const index_t (&shape)[RANK], matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor_p( ShapeType &&shape, matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor_p( TensorType &tensor, typename TensorType::shape_container &&shape, matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor_p( matxMemorySpace_t space = MATX_MANAGED_MEMORY, cudaStream_t stream = 0)
.. doxygenfunction:: make_tensor_p( T *const data, ShapeType &&shape, bool owning = false)
8 changes: 4 additions & 4 deletions docs/_sources/api/dft/fft/fft.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ Perform a 1D FFT
These functions are currently not supported with host-based executors (CPU)


.. doxygenfunction:: fft(OpA &&a, uint64_t fft_size = 0)
.. doxygenfunction:: fft(OpA &&a, const int32_t (&axis)[1], uint64_t fft_size = 0)
.. doxygenfunction:: fft(OpA &&a, uint64_t fft_size = 0, FFTNorm norm = FFTNorm::BACKWARD)
.. doxygenfunction:: fft(OpA &&a, const int32_t (&axis)[1], uint64_t fft_size = 0, FFTNorm norm = FFTNorm::BACKWARD)

Examples
~~~~~~~~
Expand All @@ -25,7 +25,7 @@ Examples
:language: cpp
:start-after: example-begin fft-2
:end-before: example-end fft-2
:dedent:
:dedent:

.. literalinclude:: ../../../../test/00_transform/FFT.cu
:language: cpp
Expand All @@ -43,4 +43,4 @@ Examples
:language: cpp
:start-after: example-begin fft-5
:end-before: example-end fft-5
:dedent:
:dedent:
4 changes: 2 additions & 2 deletions docs/_sources/api/dft/fft/ifft.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ Perform a 1D inverse FFT
These functions are currently not supported with host-based executors (CPU)


.. doxygenfunction:: ifft(OpA &&a, uint64_t fft_size = 0)
.. doxygenfunction:: ifft(OpA &&a, const int32_t (&axis)[1], uint64_t fft_size = 0)
.. doxygenfunction:: ifft(OpA &&a, uint64_t fft_size = 0, FFTNorm norm = FFTNorm::BACKWARD)
.. doxygenfunction:: ifft(OpA &&a, const int32_t (&axis)[1], uint64_t fft_size = 0, FFTNorm norm = FFTNorm::BACKWARD)

Examples
~~~~~~~~
Expand Down
21 changes: 21 additions & 0 deletions docs/_sources/api/logic/comparison/isclose.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.. _isclose_func:

isclose
=======

Determine the closeness of values across two operators using absolute and relative tolerances. The output
from isclose is an ``int`` value since it's commonly used for reductions and ``bool`` reductions using
atomics are not available in hardware.


.. doxygenfunction:: isclose

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_operators/OperatorTests.cu
:language: cpp
:start-after: example-begin isclose-test-1
:end-before: example-end isclose-test-1
:dedent:

20 changes: 20 additions & 0 deletions docs/_sources/api/logic/truth/allclose.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. _allclose_func:

allclose
========

Reduce the closeness of two operators to a single scalar (0D) output. The output
from allclose is an ``int`` value since boolean reductions are not available in hardware


.. doxygenfunction:: allclose(OutType dest, const InType1 &in1, const InType2 &in2, double rtol, double atol, SingleThreadHostExecutor exec)
.. doxygenfunction:: allclose(OutType dest, const InType1 &in1, const InType2 &in2, double rtol, double atol, cudaExecutor exec = 0)

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_operators/ReductionTests.cu
:language: cpp
:start-after: example-begin allclose-test-1
:end-before: example-end allclose-test-1
:dedent:
34 changes: 34 additions & 0 deletions docs/_sources/api/manipulation/rearranging/overlap.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
.. _overlap_func:

overlap
#######

Create an overlapping view an of input operator giving a higher-rank view of the input

For example, the following 1D tensor [1 2 3 4 5] could be cloned into a 2d tensor with a
window size of 2 and overlap of 1, resulting in::

[1 2
2 3
3 4
4 5]

Currently this only works on 1D tensors going to 2D, but may be expanded
for higher dimensions in the future. Note that if the window size does not
divide evenly into the existing column dimension, the view may chop off the
end of the data to make the tensor rectangular.

.. note::
Only 1D input operators are accepted at this time

.. doxygenfunction:: overlap( const OpType &op, const index_t (&windows)[N], const index_t (&strides)[N])
.. doxygenfunction:: overlap( const OpType &op, const std::array<index_t, N> &windows, const std::array<index_t, N> &strides)

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_operators/OperatorTests.cu
:language: cpp
:start-after: example-begin overlap-test-1
:end-before: example-end overlap-test-1
:dedent:
31 changes: 31 additions & 0 deletions docs/_sources/api/manipulation/selecting/at.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.. _at_func:

at
==

Selects a single value from an operator. Since `at` is a lazily-evaluated operator, it should be used
in situations where `operator()` cannot be used. For instance:

.. code-block:: cpp
(a = b(5)).run();
The code above creates a race condition where `b(5)` is evaluated on the host before launch, but the value may
not be computed from a previous operation. Instead, the `at()` operator can be used to defer the load until
the operation is launched:

.. code-block:: cpp
(a = at(b, 5)).run();
.. doxygenfunction:: at(const Op op, Is... indices)

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_operators/OperatorTests.cu
:language: cpp
:start-after: example-begin at-test-1
:end-before: example-end at-test-1
:dedent:

10 changes: 8 additions & 2 deletions docs/_sources/api/signalimage/convolution/conv1d.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,13 @@ conv1d

1D convolution

.. doxygenfunction:: conv1d(const In1Type &i1, const In2Type &i2, matxConvCorrMode_t mode)
Performs a convolution operation of two inputs. Three convolution modes are available: full, same, and valid. The
mode controls how much (if any) of the output is truncated to remove filter ramps. The method parameter allows
either direct or FFT-based convolution. Direct performs the typical sliding-window dot product approach, whereas
FFT uses the convolution theorem. The FFT method may be faster for large inputs, but both methods should be tested
for the target input sizes.

.. doxygenfunction:: conv1d(const In1Type &i1, const In2Type &i2, matxConvCorrMode_t mode, matxConvCorrMethod_t method)

Examples
~~~~~~~~
Expand All @@ -22,7 +28,7 @@ Examples
:end-before: example-end conv1d-test-2
:dedent:

.. doxygenfunction:: conv1d(const In1Type &i1, const In2Type &i2, const int32_t (&axis)[1], matxConvCorrMode_t mode)
.. doxygenfunction:: conv1d(const In1Type &i1, const In2Type &i2, const int32_t (&axis)[1], matxConvCorrMode_t mode = MATX_C_MODE_FULL, matxConvCorrMethod_t method = MATX_C_METHOD_DIRECT)

Examples
~~~~~~~~
Expand Down
18 changes: 18 additions & 0 deletions docs/_sources/api/signalimage/filtering/channelize_poly.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _channelize_poly_func:

channelize_poly
===============

Polyphase channelizer with a configurable number of channels

.. doxygenfunction:: matx::channelize_poly(const InType &in, const FilterType &f, index_t num_channels, index_t decimation_factor)

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_transform/ChannelizePoly.cu
:language: cpp
:start-after: example-begin channelize_poly-test-1
:end-before: example-end channelize_poly-test-1
:dedent:

23 changes: 23 additions & 0 deletions docs/_sources/api/signalimage/general/pwelch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. _pwelch_func:

pwelch
======

Estimate the power spectral density of a signal using Welch's method [1]_

.. doxygenfunction:: pwelch(const xType& x, const wType& w, index_t nperseg, index_t noverlap, index_t nfft)
.. doxygenfunction:: pwelch(const xType& x, index_t nperseg, index_t noverlap, index_t nfft)

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_operators/PWelch.cu
:language: cpp
:start-after: example-begin pwelch-test-1
:end-before: example-end pwelch-test-1
:dedent:

References
~~~~~~~~~~

.. [1] \ P. Welch, "The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms," in IEEE Transactions on Audio and Electroacoustics, vol. 15, no. 2, pp. 70-73, June 1967, doi: 10.1109/TAU.1967.1161901.
3 changes: 2 additions & 1 deletion docs/_sources/api/stats/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ Statistics

.. toctree::
:maxdepth: 2

avgvar/index.rst
corr/index.rst
hist/index.rst
misc/index.rst
11 changes: 11 additions & 0 deletions docs/_sources/api/stats/misc/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _misc_stats:

Misc
####


.. toctree::
:maxdepth: 1
:glob:

*
24 changes: 24 additions & 0 deletions docs/_sources/api/stats/misc/percentile.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. _percentile_func:

percentile
##########

Find the q-th percentile of an input sequence. ``q`` is a value between 0 and 100 representing the percentile. A value
of 0 is equivalent to mean, 100 is max, and 50 is the median when using the ``LINEAR`` method.

.. note::
Multiple q values are not supported yet

Supported methods for interpolation are: LINEAR, HAZEN, WEIBULL, LOWER, HIGHER, MIDPOINT, NEAREST, MEDIAN_UNBIASED, and NORMAL_UNBIASED

.. doxygenfunction:: percentile(const InType &in, unsigned char q, PercentileMethod method = PercentileMethod::LINEAR)
.. doxygenfunction:: percentile(const InType &in, unsigned char q, const int (&dims)[D], PercentileMethod method = PercentileMethod::LINEAR)

Examples
~~~~~~~~

.. literalinclude:: ../../../../test/00_operators/ReductionTests.cu
:language: cpp
:start-after: example-begin percentile-test-1
:end-before: example-end percentile-test-1
:dedent:
22 changes: 12 additions & 10 deletions docs/_sources/basics/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,23 +66,25 @@ require no memory.
Transform
---------

Some functions in MatX can only be executed on a single line without any other operators. For example, an fft is executed by:
Transforms are operators that take one or more inputs and call a backend library or kernel. Transforms usually changes one or
more properties of the input, but that is not always the case. An fft may change the input type or shape, but a sort transform
does not. Depending on the context used, a transform may asynchronously allocate temporary memory if the expression requires it.

For example:

.. code-block:: cpp
fft(A, A);
(b = fft(A)).run();
It is currently not valid to do something like the following:
The expression above performs an out-of-place FFT by taking the input ``A`` and storing in output ``B``. Transforms may also be used
in larger expressions:

.. code-block:: cpp
(C = B * fft(A, A)).run();
The reason this is invalid is because functions that are classified as transforms launch CUDA kernels to perform a single function,
and many times they call a CUDA library. Transforms are not operators and cannot be used in operator expressions as shown above.
Since ``fft`` is not an operator the compiler will give an error.
(C = B * fft(A)).run();
This behavior may change in the future or be relaxed for certain transforms.
In this case ``fft(A)`` may need somewhere to store the output of the FFT, and could asynchronously allocate memory to do so. However,
MatX may also perform fusion on the expression if possible.

Since some transforms rely on CUDA math library backends not all of them are available with different executors. Please see the
documentation for the individual function to check compatibility.
Expand All @@ -107,4 +109,4 @@ Shape is used to describe the size of each dimension of an operator.
Stride
------

Stride is used to describe the spacing between elements in each dimension of an operator
Stride is used to describe the spacing between elements in each dimension of an operator
Loading

0 comments on commit 7b69822

Please sign in to comment.