RPP Audio Support HIP - To Decibels (#398) · ROCm/rpp@f7a495a

Commit

RPP Audio Support HIP - To Decibels (#398)

* Bump rocm-docs-core[api_reference] from 0.35.0 to 0.35.1 in /docs/sphinx (#319)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.0 to 0.35.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core[api_reference] from 0.35.1 to 0.36.0 in /docs/sphinx (#322)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* added api support for ToDecibels HIP kernel

* added test suite support for audio in HIP

* added profiler support for hip test suite

* added initial max find support for 1D and 2D data

* added initial support for todecibels kernel

* Docs - Bump rocm-docs-core[api_reference] from 0.36.0 to 0.37.0 in /docs/sphinx (#328)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Link cleanup (#326)

* link updates

* update tables

* pare down index

* API cleanup

* consistency

* verbiage

* Update notes

* improved precision match to 6 decimals

* added kernal launch configuration for max kernel

* Docs - Bump rocm-docs-core[api_reference] from 0.37.0 to 0.37.1 in /docs/sphinx (#329)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.37.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.37.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Voxel Flip on HIP and HOST (#285)

* added support for flip voxel

* added test suite support

* added golden outputs for flip voxel

made changes in test suite to run QA tests for flip

* updated golden outputs with correct values

* minor bug fix in the hip test suite

* made changes to variable names for better readability

fixed comments in test suite

minor cleanup

* combined the flip axis factor as ternary operator in HIP kernel

added new enum for error handling when source and destination layouts are not matching

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* converted flip voxel golden outputs to bin files

* changed copyright from 2023 to 2024

* Update flip_voxel.hpp license

* License - updates to 2024 and consistency changes (#298)

* Match all CMakeLists.txt license as per RPP's outermost LICENSE file

* Match all python files' license as per RPP's outermost LICENSE file

* Match all .hpp files' license as per RPP's outermost LICENSE file

* Match all .cpp files' license as per RPP's outermost LICENSE file

* Match all .h files' license as per RPP's outermost LICENSE file

* Remove all rights reserved as per LICENSE file

* Remove double space in "Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc."

* Match all .cmake files' license as per RPP's outermost LICENSE file

* Match all .cpp.in files' license as per RPP's outermost LICENSE file

* Replace 283 occurrences in 282 files - 2023 to 2024

* Add "MIT License" title to 281 instances

* Add missing license

* Test - Update README.md for test_suite (#299)

* Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sphinx (#301)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sphinx (#302)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.1 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.1...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update doc codeowners (#303)

* Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.34.0 in /docs/sphinx (#304)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Test suite - upgrade 5 qa perf (#305)

* experimental changes for adding qa mode for performance tests

* made changes to add display more information w.r.t QA results summary for performance tests

* minor changes

* Add changes to dump qa results to excel file

* Add performance QA for three new tensor functions

* update prerequisites in readme

* added changes to handle unsupported cases

* removed treshold dictionary and added performance Noise treshold
add new dataset for performance QA

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Changes to the performane summary dataframe

* minor changes

* Update CMakeLists.txt to add ${CMAKE_CURRENT_SOURCE_DIR} for CI

* Update CMakeLists.txt fix

* Update CMakeLists.txt fix

* remove tabulate dependency

* Update README.md to remove tabulate pip install

* Fix for CI machine failure

* Add note on performance

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Abishek <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: r-abishek <[email protected]>

* RPP Color Temperature on HOST and HIP (#271)

* Initial commit - Color Temperature HOST Tensor

* Initial commit - Color Temperature HIP Tensor

* Add color temperature golden outputs

* address review comments

* Use reinterpret_cast instead of static_cast

* Combine templated functions to support all datatypes into one
(got minor perf difference of order 3%)

Also fixes indentation

* Fix i8 datatype

* Cleanup

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Fix PLN3 variant outputs

Also modifies reference outputs

* Update color_temperature.hpp license

* Delete color_temperature_u8_Tensor_PKD3.csv

* Delete color_temperature_u8_Tensor_PLN3.csv

---------

Co-authored-by: snehaa8 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>

* RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)

* added HOST support for voxel add kernel

* added HIP support for voxel add kernel

* added test suite support for add scalar

* added Doxygen support and modified hip kernel function names as per new standard

* added HOST support for voxel subtract kernel

* added HIP support for voxel subtract kernel

* added test suite support

* updated the golden outputs for subtract with correct values

* removed unnessary validation checks

* Remove double spaces

* Fix header

* Fix all retval docs

* Fix docs to add memory type

* Fix comment

* Add divider comment

* Use post-increment efficiently

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* converted add and subtract scalar golden outputs to bin files

* changed copyright from 2023 to 2024

* Update add_scalar.hpp license

* Update subtract_scalar.hpp license

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>

* RPP Magnitude on HOST and HIP (#278)

* Initial commit - Magnitude HOST Tensor

* Add QA reference outputs

* Update runTests.py

* Initial commit - Magnitude HIP Tensor

* Add dual input support in testsuite

* Optimize HOST kernel further

* Optimize i8 datatype further

* Modify comments

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Bump rocm-docs-core[api_reference] from 0.31.0 to 0.33.0 in /docs/sphinx (#294)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Copywright year

* Combine templated functions to support all datatypes

* Modify format of reference outputs

* Update rppi_arithmetic_operations.h license

* Update rppt_tensor_arithmetic_operations.h license

* Update host_tensor_arithmetic_operations.hpp

* Update magnitude.hpp license

* Update hip_tensor_arithmetic_operations.hpp license

* Delete magnitude_u8_Tensor_PKD3.csv

* Delete magnitude_u8_Tensor_PLN1.csv

* Delete magnitude_u8_Tensor_PLN3.csv

* Update rpp_test_suite_common.h license

* Update runTests.py license

* Update Tensor_hip.cpp license

* Update runTests.py license

* Update Tensor_host.cpp license

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Snehaa-Giridharan <[email protected]>

* Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sphinx (#309)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.0...v0.34.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Tensor Audio Support - Down Mixing (#296)

* Initial commit - Non slient region detection

Includes unittest setup

* Initial commit - To Decibels

Includes unittest setup

* Intial commit - pre_emphasis_filter

* Intial commit - down_mixing

* Replace vectors with arrays

* Cleanup

* Minor cleanup

* Optimize downmixing Kernel

Includes cleanup

* Replace Rpp64s with Rpp32s

* Cleanup

* Optimize and precompute cutOff

* Fix buffer used

* Fix buffer used

* Additional Cleanup

* Optimize post incrmeent operation

* Optimize post increment operation

* Update testsuite for Audio

* code cleanup

* Add Readme file for Audio test suite

* changes based on review comments

* minor change

* Remove unittest folders and updated README.md

* Remove unit tests

* minor change

* code cleanup

* added common header file for audio helper functions

* removed unncessary audio wav files

fixed bug in ROI updation for audio test suite

resolved issue in summary generation for performance tests in python

* removed log file

* added doxygen support for audio

* added doxygen changes for to_decibels

* updated test suite support for to_decibels

* minor change

* added doxygen changes for preemphasis filter

* updated changes for preemphasis filter in test suite

* removed the usage of getMax function and used std::max_element

* modularized code in test suite

* merge with latest changes

* minor change

* minor change

* minor change

* resolved codacy warnings

* Codacy fix - Remove unused cpuTime

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* resolved issue with file_system dependency in test suite

* Doxygen changes

changed malloc to new in NSR kernel

* RPP RICAP Tensor for HOST and HIP (#213)

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* Support HIP Backend for RICAP

* Fix HIP pkd3->pkd3 variant

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Improve readability and Cleanup

* Additional cleanup

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Fix codacy warnings

* Address other codacy warnings

* Update ricap.hpp with reference paper

* Add RICAP dataset path in readme

* Make changes to error codes returned

* Modify roi crop region for unit and perf tests

* RPP Tensor Water Augmentation on HOST and HIP (#181)

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Add Avx2 implementation for F32 and U8 toggle variants

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* change F32 load and store logic

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* removed golden outputs for water

* minor changes

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* changed generic nn F32 loads using gather and setr instructions

* added comments for latest changes

* minor change

* added definition for storing 32 and 64 bits from a 128bit register

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>

* Fix build error

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Boost deps fix for test suite

---------

Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>

* Documentation - Readme & changelog updates (#251)

* readme and changelog updates for 6.0

* minor update

* added ctests for audio test suite for CI

made changes to add more clarity on the QA Tests results

* Cmake mods for ctest

* HOST-only build error bugfix

* added qa mode paramter to python audio script

added golden output map for QA testing of Non silent region detection

* minor change

* Documentation - Bump rocm-docs-core[api_reference] from 0.26.0 to 0.27.0 in /docs/sphinx (#253)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Resize Mirror Normalize Bugfix (#252)

* added fix for hipMemset

* remove pixel check for U8-F32 and U8-F16 for HOST codes

---------

Co-authored-by: sampath1117 <[email protected]>

* added example for MMS calculation in comments for better understanding

* Sphinx - updates (#257)

* Sphinx - updates

* Doxygen - Updates

* Docs - Remove index.md

* updated info used to for running audio test suite

* removed bitdepth variable from audio test suite

* added more information on computing NSR outputs in the example added

* Fix doxygen for decibels

Also removes extra QA reference files

* move tensor_host_audio.cpp to host folder

* Fix build errors and qa tests in Audio Test suite

* Fix build errors and qa tests in Audio Test suite

* Add reference output and test samples for downmix

* Add down_mix in augmentation list and supported cases

* Remove auto-merge repeated funcs

* Improve clarity of header docs

* Remove blank line

* Improve clarity on header docs

* Add Doxygen comments

* minor change

* converted golden outputs to binary file for downmixing

* removed old golden output file for preemphasis and todecibels

* modified info for downmixing as per new changes

used handle memory for temporary buffers

* formatting changes

* moved the common code for SSE and AVX to outside

* Update down_mixing.hpp license

* Update rppt_tensor_audio_augmentations.h

* combined the srcLength and channels tensors into single tensor

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: Lisa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sundarrajan98 <[email protected]>

* RPP Voxel 3D Tensor Multiply scalar on HOST and HIP (#306)

* added HIP support for voxel scalar multiply kernel

* added HOST support for voxel multiply kernel

added golden outputs for voxel multiply kernel

* merge with master

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* converted multiply scalar voxel golden outputs to bin files

* changed copyright from 2023 to 2024

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>

* Test Suite Bugfix (#307)

* experimental changes for adding qa mode for performance tests

* made changes to add display more information w.r.t QA results summary for performance tests

* minor changes

* Add changes to dump qa results to excel file

* Add performance QA for three new tensor functions

* update prerequisites in readme

* added changes to handle unsupported cases

* removed treshold dictionary and added performance Noise treshold
add new dataset for performance QA

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Changes to the performane summary dataframe

* minor changes

* Update CMakeLists.txt to add ${CMAKE_CURRENT_SOURCE_DIR} for CI

* Update CMakeLists.txt fix

* Update CMakeLists.txt fix

* remove tabulate dependency

* Update README.md to remove tabulate pip install

* Fix for CI machine failure

* Add note on performance

* Fix segmentation fault

* Revert QAmode to restrict HIP bitdepths

* Use Rpp64u for HOST while comparing outputs

* Fix ambiguous abs call

* Fix for SLES CI HIP fail - error: incompatible pointer types assigning to 'unsigned long *' from 'unsigned long long *' - refOutput = TensorSumReferenceOutputs[numChannels].data();

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: Pavel Tcherniaev <[email protected]>

* Bump rocm-docs-core[api_reference] from 0.34.2 to 0.35.0 in /docs/sphinx (#313)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.2 to 0.35.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.2...v0.35.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Reduction - Tensor min and Tensor max on HOST and HIP (#260)

* Minor Change

* Add Validation check for DST_FOLDER path

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* Add Validation checks for all options in testAllScript.sh

* Add sanity check for dual Input cases
Set Max Dimension and Max Image Dump
Replaced Fast DCT tag with Accurate DCT

* Regenerate golden outputs using accurate dct Flag
Add golden outputs for some new augmentations

* Fix Flip golden outputs mismatch
Fix PLN3 variants mismatch in QA mode

* Add MAX_BATCH_SIZE check
removed Augmentations function calls for failing Qa modes
code cleanup

* Add crop and gamma correction augmentations
code cleanup

* Add comments to functions in rpp_test_suite_common.h

* minor change

* code cleanup

* minor code changes

* Change roi and Image sizes for crop augmentation

* Change numIterations option to numRuns
Addressed PR comments

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* Add turboJpeg header to update maxHeight and maxWidth values

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Change the performance Timings logic

* Add Avx2 implementation for F32 and U8 toggle variants

* minor change to support u8_f16 and u8_f32 cases

* Regenerate LUT golden outputs with ACCURATE_DCT tag

* Minor code changes

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* Made changes to the runTests.py in Host to remove testAllScipts.sh

* Made changes to the runTests.py in HIP to remove testAllScipts.sh

* Initial commit - Image min and max Reduction kernel

Includes
* u8 datatype for both min and max HOST Tensor of all variants.
* Testsuite changes.

* NWC -initial code for min max PLN3 - PLN3

* made changes to split min and max kernels seperately

* splitted kernels for min and max

* made changes to print final max/min in the R,G,B channels

* fixed inaccuracies in min/max computation

* made changes to typecast intermediate output to output requested by user

added comments for the code

code cleanup and minor changes in test suite

* fixed build issues

removed image folders used for min, max and sum

reverted unwanted file changes

* minor changes in test suite

* removed support for unwanted test case in Tensor_hip.cpp

* Adds new option roi

* remove testAllScripts.sh

* Adds roi Option in HIP backend

* Implement f32 variants

* Implement f16 and i8 datatype variants

* change F32 load and store logic

* Add build flags in CMakeLists.txt to set AVX/SSE flags based on the system configuration

* minor code changes

* Initial commit - Image sum Reduction kernel

Includes u8 PLN1 -> PLN1 conversion for HOST Tensor

* Implement PKD3 and PLN3 for Image sum Tensor HOST

* Support i8, f16 and f32 datatypes

* Initial commit - Image sum Reduction HIP kernel

Includes u8 PLN1 -> PLN1 conversion for Tensor

* Implement PKD3 and PLN3 for Image sum Tensor HIP

* Add support in testsuite

Revert normalization for i8 HOST Tensor variants

* Fix HIP testsuite

Remove additional blanks for 1 channel output

* Modify print statement in HIP testsuite

* Improve readability for testsuite outputs

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* Fix HIP to support larger inputs

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* Cleanup

* removed golden outputs for water

* minor changes

* Cleanup

Support Reduction QA test in testsuite

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* Remove unused variables and C style casting

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* Optimize u8 datatype further

* Fix static_cast

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* added rotate case with golden outputs

changed generic bilinear HOST codes to match with HIP codes

* Add golden output for remaining all tensor augmentations

* fix python script issues

* Optimize u8 and i8 datatype

Uses uint and int internal processing instead of float

* Fix testsuite build errors

* minor change

* Fix QA check

* Modify api naming from image_sum to tensor_sum

Includes changes for both HOST and HIP

* Support HIP Backend for RICAP

* change rcm and rmn golden outputs

* Fix HIP pkd3->pkd3 variant

* changes based on review comments

* change test_suite folder to tests

* Optimize u8 and i8 datatype of HIP

Includes modification in naming of shared memory

* minor fix

* changed generic nn F32 loads using gather and setr instructions

* Optimize and cleanup U8 HIP

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Fix i8 datatype variants

Includes cleanup

* Fix the issues with color_to_greyscale

* remove the empty folder creation

* reverting back the folder name change

* minor change

* added comments for latest changes

* minor change

* Improve readability and Cleanup

* Fix QA for HIP

Includes cleanup

* resolved review comments

* minor change

* Modify api naming from image_ to tensor_ for HOST

* Add support for QA tests

* removed range check for RMN U8-F32 and U8-F16 variants

changed from hipMemset to hipMemsetAsync for RMN HIP Kernel

removed multiplication by 255 for stdDev in RMN HOST U8-F16 and U8-F32 variants

* Modify naming of shared memory with _smem in HIP

Includes cleanup

* Typecast and reuse markArr for HIP U8 and I8

* Cleanup and minor optimization

* minor fix

* fix codacy warnings

* Additional cleanup

* Cleanup and move #define

* Changed the complexity of if statements in runTests.py

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Codacy fixes

* Fix codacy warnings

* Codacy fix

* Address other codacy warnings

* cleanup

* Change Image functions to generic

* Update ricap.hpp with reference paper

* resolved minor issues happened with merge

* minor changes

* fixed minor issue with getting profiler times

* minor formatting changes

* resolved build issues in test suite

renamed the min and max kernel file names

* RPP RICAP Tensor for HOST and HIP (#213)

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* Support HIP Backend for RICAP

* Fix HIP pkd3->pkd3 variant

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Improve readability and Cleanup

* Additional cleanup

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Fix codacy warnings

* Address other codacy warnings

* Update ricap.hpp with reference paper

* Add RICAP dataset path in readme

* Make changes to error codes returned

* Modify roi crop region for unit and perf tests

* RPP Tensor Water Augmentation on HOST and HIP (#181)

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Add Avx2 implementation for F32 and U8 toggle variants

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* change F32 load and store logic

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* removed golden outputs for water

* minor changes

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* changed generic nn F32 loads using gather and setr instructions

* added comments for latest changes

* minor change

* added definition for storing 32 and 64 bits from a 128bit register

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>

* Fix build error

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Boost deps fix for test suite

---------

Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>

* Documentation - Readme & changelog updates (#251)

* readme and changelog updates for 6.0

* minor update

* Documentation - Bump rocm-docs-core[api_reference] from 0.26.0 to 0.27.0 in /docs/sphinx (#253)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Resize Mirror Normalize Bugfix (#252)

* added fix for hipMemset

* remove pixel check for U8-F32 and U8-F16 for HOST codes

---------

Co-authored-by: sampath1117 <[email protected]>

* Cmake fix to prevent warning

* Fix paths in new python scripts

* Sphinx - updates (#257)

* Sphinx - updates

* Doxygen - Updates

* Docs - Remove index.md

* Test suite fixes after tensor_min / tensor_max HOST merge

* Fix max case

* QA tests fix for hip and host

* naming convention changes as per new std

* Substitute imagePartial with partial

* Substitute imageMin/imageMax with min/max

* Replace hipMemset with hipMemsetAsync, and replace hipDeviceSynchronize with hipStreamSynchronize

* Use variable instead of batchCount*4

* Use post increment effectivly

* Resolve codacy warnings

* Additional cleanup

* remove unused variable

* Documentation - Bump rocm-docs-core[api_reference] from 0.28.0 to 0.29.0 in /docs/sphinx (#265)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.28.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove auto merge boost

* Spaces formatting

* Bump rocm-docs-core[api_reference] from 0.29.0 to 0.30.1 in /docs/sphinx (#268)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.29.0 to 0.30.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.29.0...v0.30.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add support for mi300 (#269)

* Documentation - Bump rocm-docs-core[api_reference] from 0.30.1 to 0.30.2 in /docs/sphinx (#273)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.1 to 0.30.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.1...v0.30.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Cleanup by removing oneliner functions as inline

* RPP Tensor Audio Support - To Decibels (#258)

* Initial commit - Non slient region detection

Includes unittest setup

* Initial commit - To Decibels

Includes unittest setup

* Replace vectors with arrays

* Cleanup

* Replace Rpp64s with Rpp32s

* Optimize and precompute cutOff

* Fix buffer used

* Fix buffer used

* Additional Cleanup

* Update testsuite for Audio

* code cleanup

* Add Readme file for Audio test suite

* changes based on review comments

* minor change

* Remove unittest folders and updated README.md

* Remove unit tests

* minor change

* code cleanup

* added common header file for audio helper functions

* removed unncessary audio wav files

fixed bug in ROI updation for audio test suite

resolved issue in summary generation for performance tests in python

* removed log file

* added doxygen support for audio

* added doxygen changes for to_decibels

* updated test suite support for to_decibels

* minor change

* removed the usage of getMax function and used std::max_element

* modularized code in test suite

* merge with latest changes

* minor change

* minor change

* resolved codacy warnings

* Codacy fix - Remove unused cpuTime

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* resolved issue with file_system dependency in test suite

* Doxygen changes

changed malloc to new in NSR kernel

* RPP RICAP Tensor for HOST and HIP (#213)

* Initial commit - Ricap HOST Tensor

Includes testsuite changes

* Add QA tests for RICAP

Used three_images_224x224_src1 folder to create golden outputs

* Add three_images_224x224_src1 into TEST_IMAGES

* Support HIP Backend for RICAP

* Fix HIP pkd3->pkd3 variant

* regenerated golden outputs for RICAP

minor changes in HOST shell script for handling RICAP in QA mode

* minor bug fix in RICAP HIP kernels

* Improve readability and Cleanup

* Additional cleanup

* Cleanup testsuite

Includes new golden outputs

* Additional testuite fixes

* Minor cleanup

* Fix codacy warnings

* Address other codacy warnings

* Update ricap.hpp with reference paper

* Add RICAP dataset path in readme

* Make changes to error codes returned

* Modify roi crop region for unit and perf tests

* RPP Tensor Water Augmentation on HOST and HIP (#181)

* added water HOST and HIP codes

* added water case in test suite

* added golden outputs for water

* added omp thread changes for water augmentation

* experimental changes

* fixed output issue with AVX2 instructions

* added AVX2 support for PKD3 load function

minor changes in PLN variant load functions

* nwc commit - added avx2 changes for u8 layout toggle variants but need to add store functions for completion

* Add Avx2 implementation for F32 and U8 toggle variants

* Add AVX2 support for u8 pkd3-pln3 and i8 pkd3-pln3 for water augmentation

* change F32 load and store logic

* optimized the store function for F32 PLN3-PKD3

* reverted back irrelevant changes

* minor change

* optimized load and store functions for water U8 and F32 variants in host

removed commented code

* removed golden outputs for water

* minor changes

* renamed few functions and removed unused functions

updated i8 pln1 load as per the optimized u8 pln1 load

* fixed bug in i8 load function

* changed cast to c++ style

resolved spacing issues and added comments for AVX codes for better understanding

made changes to handle cases where QA Tests are not supported

* added golden outputs for water

* updated golden outputs with latest changes

* modified the u8, i8 pkd3-pln3 function and added comments for the vectorized code

* fixed minor bug in I8 variants

* made to changes to resolve codacy warnings

* changed cast to c++ style in hip kernel

* changed generic nn F32 loads using gather and setr instructions

* added comments for latest changes

* minor change

* added definition for storing 32 and 64 bits from a 128bit register

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>

* Fix build error

* CMakeLists - Version Update

1.5.0 - TOT Version

* CHANGELOG Updates

Version 1.5.0 placeholder

* Boost deps fix for test suite

---------

Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>

* Documentation - Readme & changelog updates (#251)

* readme and changelog updates for 6.0

* minor update

* added ctests for audio test suite for CI

made changes to add more clarity on the QA Tests results

* Cmake mods for ctest

* HOST-only build error bugfix

* added qa mode paramter to python audio script

added golden output map for QA testing of Non silent region detection

* minor change

* Documentation - Bump rocm-docs-core[api_reference] from 0.26.0 to 0.27.0 in /docs/sphinx (#253)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.26.0 to 0.27.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.26.0...v0.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* RPP Resize Mirror Normalize Bugfix (#252)

* added fix for hipMemset

* remove pixel check for U8-F32 and U8-F16 for HOST codes

---------

Co-authored-by: sampath1117 <[email protected]>

* added example for MMS calculation in comments for better understanding

* Sphinx - updates (#257)

* Sphinx - updates

* Doxygen - Updates

* Docs - Remove index.md

* updated info used to for running audio test suite

* removed bitdepth variable from audio test suite

* added more information on computing NSR outputs in the example added

* Fix doxygen for decibels

Also removes extra QA reference files

* Fix build errors and qa tests in Audio Test suite

* Remove auto-merge repeated funcs

* Improve clarity on header docs

* made changes based on review comments

* stored golden outputs of to_decibels in binary file

removed golden output text files for non silent region

* removed unused parameter in verify_output function

* updated list of cases supported in python script

* added error handling for opening golden output file

* Codacy fix and tests warning fix

* Codacy fix

* Codacy fix trial

* codacy fix for checking boundaries of fstream

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: Lisa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Documentation - Bump rocm-docs-core[api_reference] from 0.30.2 to 0.30.3 in /docs/sphinx (#274)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.2 to 0.30.3.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.30.2...v0.30.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding issue template (#270)

* Add files via upload

* added ROCm v6, MI300, default component

* Fix cast used in testsuite

Includes minor fixes

* Fix displaying f16 outputs

* Optimize HOST min/max reduce function further

* Fix spacing in HIP kernels

* Fix PLN1 outputs for u8 and i8 datatypes of HOST backend

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Bump rocm-docs-core[api_reference] from 0.31.0 to 0.33.0 in /docs/sphinx (#294)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Store reference outputs via map for min and max kernels

* Update tensor_max.hpp license

* Update tensor_min.hpp license

* Fix output comparison check

* Merge branch 'ar/opt_tensor_min_tensor_max' of https://github.com/r-abishek/rpp into sn/tensor_min_max

* Modify exit condition used in outer most kernel

* Modify srcIdx for HIP Tensor min

* Using maximum as 255 for HIP Tensor min

* Modify srcIdx for HIP Tensor max kernel

Also fixes build error in testsuite

* Fix corrupted outputs displayed for Tensor sum

* Fix corruption issue seen with tensor sum kernel

* Fix minimum for I8 Tensor max kernel

* Modified HIP buffer initialization with a common function

* Fix redefinition

* Remove additional variables xAlignedLength

* Remove unwanted xAlignedLength and xDiff

* Remove redefinition of TensorSumReferenceOutputs

* Fix for CI issue

* Add parenthesis

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: fiona-gladwin <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>
Co-authored-by: Lisa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lakshmi Kumar <[email protected]>
Co-authored-by: abhimeda <[email protected]>

* CI - Update precheckin.groovy

* added separate kernels for doing flip when horizontal flip is not set

* fixed build issue

* Add supported case

* reverted incorrect changes happened with merge

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <[email protected]>
Co-authored-by: Kiriti Gowda <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>
Co-authored-by: Lisa <[email protected]>
Co-authored-by: Sundarrajan98 <[email protected]>
Co-authored-by: Pavel Tcherniaev <[email protected]>
Co-authored-by: fiona-gladwin <[email protected]>
Co-authored-by: Lakshmi Kumar <[email protected]>
Co-authored-by: abhimeda <[email protected]>

* RPP Vignette Tensor on HOST and HIP (#311)

* Add Vignette Tensor HOST and HIP Implementation

* License - updates to 2024 and consistency changes (#298)

* Match all CMakeLists.txt license as per RPP's outermost LICENSE file

* Match all python files' license as per RPP's outermost LICENSE file

* Match all .hpp files' license as per RPP's outermost LICENSE file

* Match all .cpp files' license as per RPP's outermost LICENSE file

* Match all .h files' license as per RPP's outermost LICENSE file

* Remove all rights reserved as per LICENSE file

* Remove double space in "Copyright (c) 2019 - 2023 Advanced Micro Devices, Inc."

* Match all .cmake files' license as per RPP's outermost LICENSE file

* Match all .cpp.in files' license as per RPP's outermost LICENSE file

* Replace 283 occurrences in 282 files - 2023 to 2024

* Add "MIT License" title to 281 instances

* Add missing license

* Test - Update README.md for test_suite (#299)

* Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sphinx (#301)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sphinx (#302)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.1 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.1...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update doc codeowners (#303)

* Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.34.0 in /docs/sphinx (#304)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Test suite - upgrade 5 qa perf (#305)

* experimental changes for adding qa mode for performance tests

* made changes to add display more information w.r.t QA results summary for performance tests

* minor changes

* Add changes to dump qa results to excel file

* Add performance QA for three new tensor functions

* update prerequisites in readme

* added changes to handle unsupported cases

* removed treshold dictionary and added performance Noise treshold
add new dataset for performance QA

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Changes to the performane summary dataframe

* minor changes

* Update CMakeLists.txt to add ${CMAKE_CURRENT_SOURCE_DIR} for CI

* Update CMakeLists.txt fix

* Update CMakeLists.txt fix

* remove tabulate dependency

* Update README.md to remove tabulate pip install

* Fix for CI machine failure

* Add note on performance

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Abishek <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: r-abishek <[email protected]>

* RPP Color Temperature on HOST and HIP (#271)

* Initial commit - Color Temperature HOST Tensor

* Initial commit - Color Temperature HIP Tensor

* Add color temperature golden outputs

* address review comments

* Use reinterpret_cast instead of static_cast

* Combine templated functions to support all datatypes into one
(got minor perf difference of order 3%)

Also fixes indentation

* Fix i8 datatype

* Cleanup

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Fix PLN3 variant outputs

Also modifies reference outputs

* Update color_temperature.hpp license

* Delete color_temperature_u8_Tensor_PKD3.csv

* Delete color_temperature_u8_Tensor_PLN3.csv

---------

Co-authored-by: snehaa8 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: Snehaa-Giridharan <[email protected]>

* RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)

* added HOST support for voxel add kernel

* added HIP support for voxel add kernel

* added test suite support for add scalar

* added Doxygen support and modified hip kernel function names as per new standard

* added HOST support for voxel subtract kernel

* added HIP support for voxel subtract kernel

* added test suite support

* updated the golden outputs for subtract with correct values

* removed unnessary validation checks

* Remove double spaces

* Fix header

* Fix all retval docs

* Fix docs to add memory type

* Fix comment

* Add divider comment

* Use post-increment efficiently

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* converted add and subtract scalar golden outputs to bin files

* changed copyright from 2023 to 2024

* Update add_scalar.hpp license

* Update subtract_scalar.hpp license

---------

Co-authored-by: sampath1117 <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>

* RPP Magnitude on HOST and HIP (#278)

* Initial commit - Magnitude HOST Tensor

* Add QA reference outputs

* Update runTests.py

* Initial commit - Magnitude HIP Tensor

* Add dual input support in testsuite

* Optimize HOST kernel further

* Optimize i8 datatype further

* Modify comments

* RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduction (#293)

* change golden outputs from .csv files to .bin files

* Changed comparision funtions to use .bin files

* Address review comments

* minor change

* Address review comments

* minor change

---------

Co-authored-by: HazarathKumarM <[email protected]>

* Bump rocm-docs-core[api_reference] from 0.31.0 to 0.33.0 in /docs/sphinx (#294)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.31.0 to 0.33.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.31.0...v0.33.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Copywright year

* Combine templated functions to support all datatypes

* Modify format of reference outputs

* Update rppi_arithmetic_operations.h license

* Update rppt_tensor_arithmetic_operations.h license

* Update host_tensor_arithmetic_operations.hpp

* Update magnitude.hpp license

* Update hip_tensor_arithmetic_operations.hpp license

* Delete magnitude_u8_Tensor_PKD3.csv

* Delete magnitude_u8_Tensor_PLN1.csv

* Delete magnitude_u8_Tensor_PLN3.csv

* Update rpp_test_suite_common.h license

* Update runTests.py license

* Update Tensor_hip.cpp license

* Update runTests.py license

* Update Tensor_host.cpp license

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Snehaa Giridharan <[email protected]>
Co-authored-by: HazarathKumarM <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Snehaa-Giridharan <[email protected]>

* Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sphinx (#309)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github…

Loading branch information

16 people committed Aug 1, 2024

1 parent 7c4b0e3 commit f7a495a

include/rppdefs.h

-Original file line number
+Diff line change
@@ Expand Up / @@ -144,6 +144,8 @@ typedef enum @@
         RPP_ERROR_OUT_OF_BOUND_SHARED_MEMORY_SIZE    = -21,
         /*! \brief Scratch memory size needed is beyond the bounds (Needs to adhere to function specification.) \ingroup group_rppdefs */
         RPP_ERROR_OUT_OF_BOUND_SCRATCH_MEMORY_SIZE    = -22,
+        /*! \brief Number of src dims is invalid. (Needs to adhere to function specification.) \ingroup group_rppdefs */
+        RPP_ERROR_INVALID_SRC_DIMS          = -23
     } RppStatus;
     /*! \brief RPP rppStatus_t type enums
@@ Expand Down @@

include/rppt_tensor_audio_augmentations.h

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -87,11 +87,11 @@ RppStatus rppt_non_silent_region_detection_gpu(RppPtr_t srcPtr, RpptDescPtr srcD
  
    #endif // GPU_SUPPORT

    /*! \brief To Decibels augmentation on HOST backend

     * \details To Decibels augmentation for 1D audio buffer converts magnitude values to decibel values

     * \details To Decibels augmentation for 1D/2D audio buffer converts magnitude values to decibel values

     * \param [in] srcPtr source tensor in HOST memory

     * \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 3, offsetInBytes >= 0, dataType = F32)

     * \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 2 or 3 (for single-channel or multi-channel/2D audio tensor with 1 channel), offsetInBytes >= 0, dataType = F32)

     * \param [out] dstPtr destination tensor in HOST memory

     * \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 3, offsetInBytes >= 0, dataType = F32)

     * \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 2 or 3 (for single-channel or multi-channel/2D audio tensor with 1 channel), offsetInBytes >= 0, dataType = F32)

     * \param [in] srcDims source tensor sizes for each element in batch (2D tensor in HOST memory, of size batchSize * 2)

     * \param [in] cutOffDB  minimum or cut-off ratio in dB

     * \param [in] multiplier factor by which the logarithm is multiplied

    @@ -103,6 +103,25 @@ RppStatus rppt_non_silent_region_detection_gpu(RppPtr_t srcPtr, RpptDescPtr srcD
  
     */

    RppStatus rppt_to_decibels_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptImagePatchPtr srcDims, Rpp32f cutOffDB, Rpp32f multiplier, Rpp32f referenceMagnitude, rppHandle_t rppHandle);

    #ifdef GPU_SUPPORT

    /*! \brief To Decibels augmentation on HIP backend

     * \details To Decibels augmentation for 1D/2D audio buffer converts magnitude values to decibel values

     * \param [in] srcPtr source tensor in HIP memory

     * \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 2 or 3 (for single-channel or multi-channel/2D audio tensor with 1 channel), offsetInBytes >= 0, dataType = F32)

     * \param [out] dstPtr destination tensor in HIP memory

     * \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 2 or 3 (for single-channel or multi-channel/2D audio tensor with 1 channel), offsetInBytes >= 0, dataType = F32)

     * \param [in] srcDims source tensor sizes for each element in batch (2D tensor in Pinned/HIP memory, of size batchSize * 2)

     * \param [in] cutOffDB  minimum or cut-off ratio in dB

     * \param [in] multiplier factor by which the logarithm is multiplied

     * \param [in] referenceMagnitude Reference magnitude if not provided maximum value of input used as reference

     * \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>

     * \return A <tt> \ref RppStatus</tt> enumeration.

     * \retval RPP_SUCCESS Successful completion.

     * \retval RPP_ERROR* Unsuccessful completion.

     */

    RppStatus rppt_to_decibels_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptImagePatchPtr srcDims, Rpp32f cutOffDB, Rpp32f multiplier, Rpp32f referenceMagnitude, rppHandle_t rppHandle);

    #endif // GPU_SUPPORT

    /*! \brief Pre Emphasis Filter augmentation on HOST backend

     * \details Pre Emphasis Filter augmentation for audio data

     * \param [in] srcPtr source tensor in HOST memory

src/modules/hip/hip_tensor_audio_augmentations.hpp

-Original file line number
+Diff line change
@@ Expand Up / @@ -26,5 +26,6 @@ SOFTWARE. @@
     #define HIP_TENSOR_AUDIO_AUGMENTATIONS_HPP
     #include "kernel/non_silent_region_detection.hpp"
+    #include "kernel/to_decibels.hpp"
     #endif // HIP_TENSOR_AUDIO_AUGMENTATIONS_HPP

src/modules/hip/kernel/to_decibels.hpp

-Original file line number
+Diff line change
@@ -0,0 +1,312 @@
+    #include <hip/hip_runtime.h>
+    #include "rpp_hip_common.hpp"
+    // -------------------- Set 0 - to_decibels device helpers --------------------
+    __device__ __forceinline__ void to_decibels_hip_compute(d_float8 *src_f8, d_float8 *dst_f8, double minRatio, float multiplier, float inverseMagnitude)
+    {
+        dst_f8->f1[0] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[0]) * inverseMagnitude)));
+        dst_f8->f1[1] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[1]) * inverseMagnitude)));
+        dst_f8->f1[2] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[2]) * inverseMagnitude)));
+        dst_f8->f1[3] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[3]) * inverseMagnitude)));
+        dst_f8->f1[4] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[4]) * inverseMagnitude)));
+        dst_f8->f1[5] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[5]) * inverseMagnitude)));
+        dst_f8->f1[6] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[6]) * inverseMagnitude)));
+        dst_f8->f1[7] = multiplier * log2(max(minRatio, (static_cast<double>(src_f8->f1[7]) * inverseMagnitude)));
+    }
+    // -------------------- Set 1 -  kernels for finding inverse magnitude value --------------------
+    __global__ void inverse_magnitude_hip_tensor(float *srcPtr,
+                                                 int maxLength,
+                                                 bool computeMax,
+                                                 float *inverseMagnitudeTensor)
+    {
+        int id_x = hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x;
+        int id_z = hipBlockIdx_z * hipBlockDim_z + hipThreadIdx_z;
+        // Do final reduction on block wise max
+        if (computeMax)
+        {
+            uint srcIdx = id_z * maxLength;
+            __shared__ float max_smem[256];                                     // 256 values of src in a 256 x 1 thread block
+            max_smem[hipThreadIdx_x] = srcPtr[srcIdx];                          // initialization of LDS to start value using all 256 threads
+            if (id_x >= maxLength)
+                return;
+            srcIdx += id_x;
+            float maxVal = srcPtr[srcIdx];
+            while (id_x < maxLength)
+            {
+                maxVal = fmaxf(maxVal, srcPtr[srcIdx]);
+                id_x += hipBlockDim_x;
+                srcIdx += hipBlockDim_x;
+            }
+            max_smem[hipThreadIdx_x] = maxVal;
+            __syncthreads();                                                    // syncthreads after max compute
+            // Reduction of 256 floats on 256 threads per block in x dimension
+            for (int threadMax = 128; threadMax >= 1; threadMax /= 2)
+            {
+                if (hipThreadIdx_x < threadMax)
+                    max_smem[hipThreadIdx_x] = max(max_smem[hipThreadIdx_x], max_smem[hipThreadIdx_x + threadMax]);
+                __syncthreads();
+            }
+            // Final store to dst
+            if (hipThreadIdx_x == 0)
+                inverseMagnitudeTensor[id_z] = 1.f / max_smem[0];
+        }
+        else
+        {
+            inverseMagnitudeTensor[id_z] = 1.0f;
+        }
+    }
+    __global__ void max_reduction_1d_hip_tensor(float *srcPtr,
+                                                uint2 srcStridesNH,
+                                                RpptImagePatchPtr srcDims,
+                                                float *maxArr)
+    {
+        int id_y = hipBlockIdx_y * hipBlockDim_y + hipThreadIdx_y;
+        int id_z = hipBlockIdx_z * hipBlockDim_z + hipThreadIdx_z;
+        int id_x = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x) * 8;
+        uint srcLength = srcDims[id_z].height;
+        uint srcIdx = id_z * srcStridesNH.x;
+        __shared__ float max_smem[256];                                     // 256 values of src in a 256 x 1 thread block
+        max_smem[hipThreadIdx_x] = srcPtr[srcIdx];                          // initialization of LDS to start value using all 256 threads
+        if (id_x >= srcLength)
+            return;
+        srcIdx += id_x;
+        d_float8 src_f8;
+        rpp_hip_load8_and_unpack_to_float8(srcPtr + srcIdx, &src_f8);       // load 8 pixels to local memory
+        rpp_hip_math_max8(&src_f8, &max_smem[hipThreadIdx_x]);
+        __syncthreads();                                                    // syncthreads after max compute
+        // Reduction of 256 floats on 256 threads per block in x dimension
+        for (int threadMax = 128; threadMax >= 1; threadMax /= 2)
+        {
+            if (hipThreadIdx_x < threadMax)
+                max_smem[hipThreadIdx_x] = fmaxf(max_smem[hipThreadIdx_x], max_smem[hipThreadIdx_x + threadMax]);
+            __syncthreads();
+        }
+        // Final store to dst
+        if (hipThreadIdx_x == 0)
+            maxArr[id_z * hipGridDim_x + hipBlockIdx_x] = max_smem[0];
+    }
+    __global__ void max_reduction_2d_hip_tensor(float *srcPtr,
+                                                uint2 srcStridesNH,
+                                                RpptImagePatchPtr srcDims,
+                                                float *maxArr)
+    {
+        int id_y = hipBlockIdx_y * hipBlockDim_y + hipThreadIdx_y;
+        int id_z = hipBlockIdx_z * hipBlockDim_z + hipThreadIdx_z;
+        int id_x = hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x;
+        __shared__ float partialMax_smem[16][16];                                   // 16 rows of src, 16 reduced cols of src in a 16 x 16 thread block
+        uint srcIdx = (id_z * srcStridesNH.x);
+        float *partialMaxRowPtr_smem = &partialMax_smem[hipThreadIdx_y][0];         // float pointer to beginning of each row in LDS
+        partialMaxRowPtr_smem[hipThreadIdx_x] = srcPtr[srcIdx];                     // initialization of LDS to start value using all 16 x 16 threads
+        if ((id_y >= srcDims[id_z].height) || (id_x >= srcDims[id_z].width))
+            return;
+        srcIdx += ((id_y * srcStridesNH.y) + id_x);
+        partialMaxRowPtr_smem[hipThreadIdx_x] = srcPtr[srcIdx];
+        __syncthreads();                                                            // syncthreads
+        // Reduction of 16 floats on 16 threads per block in x dimension (for every y dimension)
+        for (int threadMax = 8; threadMax >= 1; threadMax /= 2)
+        {
+            if (hipThreadIdx_x < threadMax)
+                partialMaxRowPtr_smem[hipThreadIdx_x] = fmaxf(partialMaxRowPtr_smem[hipThreadIdx_x], partialMaxRowPtr_smem[hipThreadIdx_x + threadMax]);
+            __syncthreads();
+        }
+        if (hipThreadIdx_x == 0)
+        {
+            // Reduction of 16 floats on 16 threads per block in y dimension
+            for (int threadMax = 8, increment = 128; threadMax >= 1; threadMax /= 2, increment /= 2)
+            {
+                if (hipThreadIdx_y < threadMax)
+                    partialMaxRowPtr_smem[0] = fmaxf(partialMaxRowPtr_smem[0], partialMaxRowPtr_smem[increment]);
+                __syncthreads();
+            }
+            // Final store to dst
+            if (hipThreadIdx_y == 0)
+                maxArr[(hipBlockIdx_z * hipGridDim_y + hipBlockIdx_y) * hipGridDim_x + hipBlockIdx_x] = partialMaxRowPtr_smem[0];
+        }
+    }
+    // -------------------- Set 2 - to decibels kernels --------------------
+    __global__ void to_decibels_1d_hip_tensor(float *srcPtr,
+                                              uint srcStride,
+                                              float *dstPtr,
+                                              uint dstStride,
+                                              RpptImagePatchPtr srcDims,
+                                              double minRatio,
+                                              float multiplier,
+                                              float *inverseMagnitudeTensor)
+    {
+        int id_x = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x) * 8;
+        int id_z = hipBlockIdx_z * hipBlockDim_z + hipThreadIdx_z;
+        if (id_x >= srcDims[id_z].height)
+            return;
+        uint srcIdx = (id_z * srcStride) + id_x;
+        float inverseMagnitude = inverseMagnitudeTensor[id_z];
+        d_float8 src_f8, dst_f8;
+        rpp_hip_load8_and_unpack_to_float8(srcPtr + srcIdx, &src_f8);
+        to_decibels_hip_compute(&src_f8, &dst_f8, minRatio, multiplier, inverseMagnitude);
+        uint dstIdx = (id_z * dstStride) + id_x;
+        rpp_hip_pack_float8_and_store8(dstPtr + dstIdx, &dst_f8);
+    }
+    __global__ void to_decibels_2d_hip_tensor(float *srcPtr,
+                                              uint2 srcStridesNH,
+                                              float *dstPtr,
+                                              uint2 dstStridesNH,
+                                              RpptImagePatchPtr srcDims,
+                                              double minRatio,
+                                              float multiplier,
+                                              float *inverseMagnitudeTensor)
+    {
+        int id_x = hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x;
+        int id_y = hipBlockIdx_y * hipBlockDim_y + hipThreadIdx_y;
+        int id_z = hipBlockIdx_z * hipBlockDim_z + hipThreadIdx_z;
+        if (id_x >= srcDims[id_z].width || id_y >= srcDims[id_z].height)
+            return;
+        uint srcIdx = (id_z * srcStridesNH.x) + (id_y * srcStridesNH.y) + id_x;
+        uint dstIdx = (id_z * dstStridesNH.x) + (id_y * dstStridesNH.y) + id_x;
+        float inverseMagnitude = inverseMagnitudeTensor[id_z];
+        dstPtr[dstIdx] = multiplier * log2(max(minRatio, (static_cast<double>(srcPtr[srcIdx]) * inverseMagnitude)));
+    }
+    // -------------------- Set 3 - to decibels kernels executor --------------------
+    RppStatus hip_exec_to_decibels_tensor(Rpp32f *srcPtr,
+                                          RpptDescPtr srcDescPtr,
+                                          Rpp32f *dstPtr,
+                                          RpptDescPtr dstDescPtr,
+                                          RpptImagePatchPtr srcDims,
+                                          Rpp32f cutOffDB,
+                                          Rpp32f multiplier,
+                                          Rpp32f referenceMagnitude,
+                                          rpp::Handle& handle)
+    {
+        Rpp32u numDims = srcDescPtr->numDims - 1;   // exclude batchSize from input dims
+        // Calculate the intermediate values needed for DB conversion
+        Rpp32f minRatio = std::pow(10, cutOffDB / multiplier);
+        if(!minRatio)
+            minRatio = std::nextafter(0.0f, 1.0f);
+        const Rpp32f log10Factor = 0.3010299956639812;      //1 / std::log(10);
+        multiplier *= log10Factor;
+        // calculate max in input if referenceMagnitude = 0
+        Rpp32f *partialMaxArr = handle.GetInitHandle()->mem.mgpu.scratchBufferHip.floatmem;
+        Rpp32s numBlocksPerSample = 0;
+        Rpp32s globalThreads_z = dstDescPtr->n;
+        // find the invReferenceMagnitude value
+        bool computeMax = (!referenceMagnitude);
+        if(computeMax)
+        {
+            if (numDims == 1)
+            {
+                numBlocksPerSample = ceil(static_cast<Rpp32f>((srcDescPtr->strides.nStride + 7) >> 3) / LOCAL_THREADS_X_1DIM);
+                hipLaunchKernelGGL(max_reduction_1d_hip_tensor,
+                                   dim3(numBlocksPerSample, 1, globalThreads_z),
+                                   dim3(LOCAL_THREADS_X_1DIM, LOCAL_THREADS_Y_1DIM, LOCAL_THREADS_Z_1DIM),
+,
+                                   handle.GetStream(),
+                                   srcPtr,
+                                   make_uint2(srcDescPtr->strides.nStride, 1),
+                                   srcDims,
+                                   partialMaxArr);
+            }
+            else if (numDims == 2)
+            {
+                Rpp32s gridDim_x = ceil(static_cast<Rpp32f>((srcDescPtr->strides.hStride)/LOCAL_THREADS_X));
+                Rpp32s gridDim_y = ceil(static_cast<Rpp32f>(srcDescPtr->h)/LOCAL_THREADS_Y);
+                Rpp32s gridDim_z = ceil(static_cast<Rpp32f>(globalThreads_z)/LOCAL_THREADS_Z);
+                numBlocksPerSample = gridDim_x * gridDim_y * gridDim_z;
+                hipLaunchKernelGGL(max_reduction_2d_hip_tensor,
+                                   dim3(gridDim_x, gridDim_y, gridDim_z),
+                                   dim3(LOCAL_THREADS_X, LOCAL_THREADS_Y, LOCAL_THREADS_Z),
+,
+                                   handle.GetStream(),
+                                   srcPtr,
+                                   make_uint2(srcDescPtr->strides.nStride, srcDescPtr->strides.hStride),
+                                   srcDims,
+                                   partialMaxArr);
+            }
+            hipStreamSynchronize(handle.GetStream());
+        }
+        Rpp32u blockSize = (computeMax) ? 256: 1;
+        Rpp32f *inverseMagnitudeTensor = partialMaxArr + globalThreads_z * numBlocksPerSample;
+        hipLaunchKernelGGL(inverse_magnitude_hip_tensor,
+                           dim3(1, 1,  globalThreads_z),
+                           dim3(blockSize, 1, 1),
+,
+                           handle.GetStream(),
+                           partialMaxArr,
+                           numBlocksPerSample,
+                           computeMax,
+                           inverseMagnitudeTensor);
+        hipStreamSynchronize(handle.GetStream());
+        // launch kernel for todecibels
+        if (numDims == 1)
+        {
+            Rpp32s globalThreads_x = (srcDescPtr->strides.nStride + 7) >> 3;
+            Rpp32s globalThreads_y = 1;
+            hipLaunchKernelGGL(to_decibels_1d_hip_tensor,
+                               dim3(ceil((Rpp32f)globalThreads_x/LOCAL_THREADS_X_1DIM), ceil((Rpp32f)globalThreads_y/LOCAL_THREADS_Y_1DIM), ceil((Rpp32f)globalThreads_z/LOCAL_THREADS_Z_1DIM)),
+                               dim3(LOCAL_THREADS_X_1DIM, LOCAL_THREADS_Y_1DIM, LOCAL_THREADS_Z_1DIM),
+,
+                               handle.GetStream(),
+                               srcPtr,
+                               srcDescPtr->strides.nStride,
+                               dstPtr,
+                               dstDescPtr->strides.nStride,
+                               srcDims,
+                               static_cast<double>(minRatio),
+                               multiplier,
+                               inverseMagnitudeTensor);
+        }
+        else if (numDims == 2)
+        {
+            Rpp32s globalThreads_x = srcDescPtr->strides.hStride;
+            Rpp32s globalThreads_y = srcDescPtr->h;
+            hipLaunchKernelGGL(to_decibels_2d_hip_tensor,
+                               dim3(ceil((Rpp32f)globalThreads_x/LOCAL_THREADS_X), ceil((Rpp32f)globalThreads_y/LOCAL_THREADS_Y), ceil((Rpp32f)globalThreads_z/LOCAL_THREADS_Z)),
+                               dim3(LOCAL_THREADS_X, LOCAL_THREADS_Y, LOCAL_THREADS_Z),
+,
+                               handle.GetStream(),
+                               srcPtr,
+                               make_uint2(srcDescPtr->strides.nStride, srcDescPtr->strides.hStride),
+                               dstPtr,
+                               make_uint2(dstDescPtr->strides.nStride, dstDescPtr->strides.hStride),
+                               srcDims,
+                               static_cast<double>(minRatio),
+                               multiplier,
+                               inverseMagnitudeTensor);
+        }
+        return RPP_SUCCESS;
+    }

0 comments on commit `f7a495a`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `f7a495a`

Commit

There are no files selected for viewing

0 comments on commit f7a495a

0 comments on commit `f7a495a`