Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update version to 1.7.0 #471

Merged
merged 3 commits into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,23 @@
Documentation for rocWMMA is available at
[https://rocm.docs.amd.com/projects/rocWMMA/en/latest](https://rocm.docs.amd.com/projects/rocWMMA/en/latest).

## (Unreleased) rocWMMA 1.6.0 for ROCm 6.3.0
## (Unreleased) rocWMMA 1.7.0 for ROCm 6.4.0

### Added

* Added interleaved layouts that enhance the performance of GEMM operations
* Added emulation test suites. These suites are lightweight and well-suited for execution on emulator platforms

### Changed

* Used GPU_TARGETS instead of AMDGPU_TARGETS in `cmakelists.txt`
* Used `--offload-compress` flag for supported compilers

### Resolved issues

* For a CMake bug workaround, set `CMAKE_NO_BUILTIN_CHRPATH` when `BUILD_OFFLOAD_COMPRESS` is unset

## rocWMMA 1.6.0 for ROCm 6.3.0

### Added

Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ include(ROCMCheckTargetIds)
include(ROCMClients)

# Versioning via rocm-cmake
set ( VERSION_STRING "1.6.0" )
set ( VERSION_STRING "1.7.0" )
rocm_setup_version( VERSION ${VERSION_STRING} )

# configure a header file to pass the CMake version settings to the source
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ The test suite includes validation and benchmarking projects that focus on unit

rocWMMA currently supports the following AMDGPU architectures:

* CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx940, gfx940, gfx942 as 'gfx9'
* CDNA class GPU featuring matrix core support: gfx908, gfx90a, gfx940, gfx941, gfx942 as 'gfx9'
* RDNA3 class GPU featuring AI acceleration support: gfx1100, gfx1101, gfx1102 as 'gfx11'

Dependencies:

* Minimum ROCm version support is 6.3.
* Minimum ROCm version support is 6.4.
* Minimum cmake version support is 3.14.
* Minimum ROCm-cmake version support is 0.8.0.
* Minimum rocBLAS version support is rocBLAS 4.0.0 for ROCm 6.0* (or ROCm packages rocblas and rocblas-dev).
Expand All @@ -47,7 +47,7 @@ For more detailed information, please refer to the [rocWMMA installation guide](

|Option|Description|Default value|
|---|---|---|
|AMDGPU_TARGETS|Build code for specific GPU target(s)|gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx1100;gfx1101;gfx1102|
|AMDGPU_TARGETS|Build code for specific GPU target(s)|gfx908;gfx90a;gfx942;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201|
|ROCWMMA_BUILD_TESTS|Build Tests|ON|
|ROCWMMA_BUILD_SAMPLES|Build Samples|ON|
|ROCWMMA_BUILD_DOCS|Build doxygen documentation from code|OFF|
Expand All @@ -67,7 +67,7 @@ results. Here are some configuration examples:
|Configuration|Command|
|---|---|
|Basic|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> .`|
|Targeting gfx908|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908:xnack-` |
|Targeting gfx908|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908` |
|Debug build|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug` |
|Build without rocBLAS (default on)|`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B<build_dir> . -DROCWMMA_VALIDATE_WITH_ROCBLAS=OFF -DROCWMMA_BENCHMARK_WITH_ROCBLAS=OFF` |

Expand Down
12 changes: 11 additions & 1 deletion docs/api-reference/api-reference-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,6 @@ layout_t
^^^^^^^^

.. doxygenenum:: rocwmma::layout_t
:members:


rocWMMA API functions
Expand Down Expand Up @@ -315,3 +314,14 @@ Sample programs

See a sample code for calling rocWMMA functions ``load_matrix_sync``, ``store_matrix_sync``, ``fill_fragment``, and ``mma_sync`` `here <https://github.com/ROCm/rocWMMA/blob/develop/samples/simple_hgemm.cpp>`_.
For more such sample programs, refer to the `Samples directory <https://github.com/ROCm/rocWMMA/tree/develop/samples>`_.

Emulation tests
---------------

The emulation test is a smaller test suite specifically designed for emulators. It comprises a selection of test cases from the full ROCWMM test set, allowing for significantly faster execution on emulated platforms. Despite its concise nature, the emulation test supports ``smoke``, ``regression``, and ``extended`` modes.

For example, run a smoke test.

.. code-block:: bash

rtest.py --install_dir <build_dir> --emulation smoke
12 changes: 8 additions & 4 deletions docs/install/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ To install rocWMMA on SLES, use:

Once installed, rocWMMA can be used just like any other library with a C++ API.

.. note::
The prebuilt package supports the following targets: ``gfx908``; ``gfx90a``; ``gfx942``; ``gfx1100``; ``gfx1101``; ``gfx1102``; ``gfx1200``; ``gfx1201``


Once rocWMMA is installed, you can see the ``rocwmma.hpp`` header file in the ``/opt/rocm/include/rocwmma`` directory.
You must include only ``rocwmma.hpp``, ``rocwmma_coop.hpp`` and ``rocwmma_transforms.hpp`` in the user code to make calls into rocWMMA.
Don't directly include other rocWMMA files that are found in ``/opt/rocm/include/internal``.
Expand Down Expand Up @@ -90,7 +94,7 @@ Dependencies
^^^^^^^^^^^^
rocWMMA is designed to have minimal external dependencies such that it is light-weight and portable.

* Minimum ROCm version support is 6.0.
* Minimum ROCm version support is 6.4.
* Minimum cmake version support is 3.14.
* Minimum ROCm-cmake version support is 0.8.0.
* Minimum rocBLAS version support is rocBLAS 4.0.0 for ROCm 6.0* (or ROCm packages rocblas and rocblas-dev).
Expand Down Expand Up @@ -185,7 +189,7 @@ Below are the project options available to build rocWMMA library with or without
- **Default Value**
* - AMDGPU_TARGETS
- Build code for specific GPU target(s)
- ``gfx908:xnack-``; ``gfx90a:xnack-``; ``gfx90a:xnack+``; ``gfx940``; ``gfx941``; ``gfx942``; ``gfx1100``; ``gfx1101``; ``gfx1102``
- ``gfx908``; ``gfx90a``; ``gfx942``; ``gfx1100``; ``gfx1101``; ``gfx1102``; ``gfx1200``; ``gfx1201``
* - ROCWMMA_BUILD_TESTS
- Build Tests
- ON
Expand Down Expand Up @@ -235,7 +239,7 @@ Here are some other example project configurations:
+===================================+================================================================================================================================================================+
| Basic | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir>` |
+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Targeting gfx908 | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DAMDGPU_TARGETS=gfx908:xnack-` |
| Targeting gfx908 | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DAMDGPU_TARGETS=gfx908` |
+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Debug build | :code:`CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B <build_dir> . -DCMAKE_BUILD_TYPE=Debug` |
+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -481,7 +485,7 @@ Build performance

Depending on the resources available to the build machine and the build configuration selected, rocWMMA build times can be on the order of an hour or more. Here are some things you can do to reduce build times:

* Target a specific GPU (e.g., ``-D AMDGPU_TARGETS=gfx908:xnack-``)
* Target a specific GPU (e.g., ``-D AMDGPU_TARGETS=gfx908``)
* Use lots of threads (e.g., ``-j32``)
* Select ``ROCWMMA_BUILD_ASSEMBLY=OFF``
* Select ``ROCWMMA_BUILD_DOCS=OFF``.
Expand Down