Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Jitter on HOST and HIP #384

Merged
merged 80 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
37a5999
Jitter Tensor Kernel
snehaa8 Nov 21, 2022
3c4e534
Jitter HIP Kernel
fiona-gladwin Nov 22, 2022
6ab428d
Jitter Tensor Kernel
fiona-gladwin Nov 23, 2022
939b251
Merge branch 'sn/jitter_host' of https://github.com/fiona-gladwin/rpp…
fiona-gladwin Nov 23, 2022
5ee7fad
Jitter PKD3 to PLN3 version
fiona-gladwin Nov 23, 2022
5c5c972
Fix Jitter variations of HIP and HOST u8
fiona-gladwin Nov 23, 2022
60a82a3
Fix Jitter variations of HIP and HOST u8
snehaa8 Nov 23, 2022
6661cdc
Jitter Tensor HOST variations
snehaa8 Nov 24, 2022
1c01059
Fix Jitter HOST f16 variations
snehaa8 Nov 24, 2022
694a9f5
Cleanup and Optimize Jitter HOST AVX
snehaa8 Nov 25, 2022
2dd5065
Fix boundary pixels in Jitter HOST Kernel
snehaa8 Nov 25, 2022
4c58ae5
Fix bound compute
snehaa8 Nov 28, 2022
7aab8ff
Merge branch 'sn/jitter_host' of https://github.com/fiona-gladwin/rpp…
fiona-gladwin Nov 29, 2022
1af3135
Fix merge conflicts
fiona-gladwin Nov 29, 2022
04c1c41
Merge branch 'master' of https://github.com/GPUOpen-ProfessionalCompu…
snehaa8 Dec 19, 2022
2002bbf
Cleanup Jitter Implementation
snehaa8 Dec 19, 2022
664dc63
Additional cleanup
snehaa8 Dec 19, 2022
8a4ec95
Cleanup
snehaa8 Jan 5, 2023
a1f4213
License - updates to 2024 and consistency changes (#298)
r-abishek Jan 31, 2024
7096c1d
Test - Update README.md for test_suite (#299)
r-abishek Jan 31, 2024
07a5f66
Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sph…
dependabot[bot] Feb 6, 2024
a5e5679
Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sph…
dependabot[bot] Feb 7, 2024
e8aa6b2
Update doc codeowners (#303)
samjwu Feb 8, 2024
a921332
Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.3…
dependabot[bot] Feb 9, 2024
30bed4e
Test suite - upgrade 5 qa perf (#305)
kiritigowda Feb 9, 2024
5c423ab
RPP Color Temperature on HOST and HIP (#271)
r-abishek Feb 9, 2024
df6e2c9
RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)
r-abishek Feb 9, 2024
a4ed137
RPP Magnitude on HOST and HIP (#278)
r-abishek Feb 14, 2024
1976cbf
Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sph…
dependabot[bot] Feb 16, 2024
ec8f2f0
RPP Tensor Audio Support - Down Mixing (#296)
r-abishek Feb 16, 2024
29a5c82
RPP Voxel 3D Tensor Multiply scalar on HOST and HIP (#306)
r-abishek Feb 16, 2024
98a3c82
Test Suite Bugfix (#307)
r-abishek Feb 16, 2024
9a2305d
merge latest changes
snehaa8 Feb 19, 2024
46cd668
Add HOST test suite support
snehaa8 Feb 20, 2024
12ce987
fix output corruption
snehaa8 Mar 4, 2024
c33af22
Bump rocm-docs-core[api_reference] from 0.35.0 to 0.35.1 in /docs/sph…
dependabot[bot] Mar 6, 2024
14f6334
Bump rocm-docs-core[api_reference] from 0.35.1 to 0.36.0 in /docs/sph…
dependabot[bot] Mar 12, 2024
95c3272
Merge branch 'master' into develop
kiritigowda Mar 12, 2024
641f653
Docs - Bump rocm-docs-core[api_reference] from 0.36.0 to 0.37.0 in /d…
dependabot[bot] Mar 20, 2024
5568573
Link cleanup (#326)
LisaDelaney Mar 20, 2024
a6749ba
Update notes
LisaDelaney Mar 20, 2024
a255906
Docs - Bump rocm-docs-core[api_reference] from 0.37.0 to 0.37.1 in /d…
dependabot[bot] Mar 22, 2024
d3df761
RPP Voxel Flip on HIP and HOST (#285)
r-abishek Mar 23, 2024
ebecb42
RPP Vignette Tensor on HOST and HIP (#311)
r-abishek Mar 23, 2024
fc1410b
Bump rocm-docs-core[api_reference] from 0.37.1 to 0.38.0 in /docs/sph…
dependabot[bot] Mar 27, 2024
3ebd7c3
RPP Tensor Audio Support - Resample (#310)
r-abishek Apr 3, 2024
76f31df
Docs - Missing input and output images for Doxygen (#331)
r-abishek Apr 3, 2024
b83f910
Scratch buffers rename for HOST and HIP (#324)
r-abishek Apr 3, 2024
ebeb131
Update CMakeLists.txt
kiritigowda Apr 3, 2024
6930465
RPP BitwiseAND and BitwiseOR Tensor on HOST and HIP (#318)
r-abishek Apr 9, 2024
1d4e4aa
Merge latest changes
snehaa8 Apr 11, 2024
f9a7063
Merge remote-tracking branch 'TOT/develop' into HEAD
sampath1117 May 6, 2024
77e14ef
Minor common-fixes for HIP (#345)
r-abishek May 7, 2024
34f3f6d
Readme Updates: --usecase=rocm (#349)
kiritigowda May 8, 2024
ab52683
RPP Tensor Audio Support - Spectrogram (#312)
r-abishek May 8, 2024
ee0d6fe
Update CHANGELOG.md (#352)
r-abishek May 8, 2024
2decd32
RPP Tensor Audio Support - Slice (#325)
r-abishek May 8, 2024
30ce1d6
RPP Tensor Audio Support - MelFilterBank (#332)
r-abishek May 8, 2024
64ae74f
RPP Tensor Normalize ND on HOST and HIP (#335)
r-abishek May 9, 2024
1a3015c
SWDEV-459739 - Remove the package obsolete setting (#353)
raramakr May 9, 2024
4cb8d4b
Audio support merge commit fixes (#354)
r-abishek May 9, 2024
1527846
Merge remote-tracking branch 'abishek/develop' into sn/jitter_host
sampath1117 May 16, 2024
5ac794c
Revert unnecessary merge changes
sampath1117 May 16, 2024
d786a61
minor change
sampath1117 May 16, 2024
6997b60
Address review comments
sampath1117 May 16, 2024
38b8042
Address review comments
sampath1117 May 17, 2024
6fcd3f7
convert SSE functions to AVX
HazarathKumarM May 29, 2024
96c9dbf
Merge remote-tracking branch 'abishek/develop' into sn/jitter_host
HazarathKumarM May 30, 2024
20c868d
Merge remote-tracking branch 'abishek/develop' into sn/jitter_host
HazarathKumarM Jun 18, 2024
8faca28
Resolve Review comments
HazarathKumarM Jun 18, 2024
182e0ec
Resolve review comments
HazarathKumarM Jun 19, 2024
89a14d1
Add jitter in non QA case list and random Outputs list
HazarathKumarM Jun 19, 2024
ae374fd
Merge pull request #234 from sampath1117/sn/jitter_host
r-abishek Jun 19, 2024
e44e52b
Add Warp Affine test case in test suite
HazarathKumarM Jun 21, 2024
7139d7d
Merge pull request #282 from sampath1117/sn/jitter_host
r-abishek Jun 24, 2024
eb0e854
Merge branch 'develop' into ar/opt_jitter
r-abishek Jun 24, 2024
1f07509
Merge branch 'develop' into ar/opt_jitter
kiritigowda Jun 24, 2024
6e8cec3
Merge branch 'develop' into ar/opt_jitter
r-abishek Jul 16, 2024
ef33997
Merge branch 'develop' into ar/opt_jitter
kiritigowda Jul 22, 2024
6406641
Merge branch 'develop' into ar/opt_jitter
r-abishek Jul 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 44 additions & 0 deletions include/rppt_tensor_effects_augmentations.h
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,50 @@ RppStatus rppt_vignette_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t d
RppStatus rppt_vignette_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *vignetteIntensityTensor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/******************** jitter ********************/

/*! \brief Jitter augmentation on HOST backend for a NCHW/NHWC layout tensor
* \details The jitter augmentation adds a jitter effect for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.png Sample Input
* \image html effects_augmentations_jitter_img150x150.png Sample Output
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] kernelSizeTensor kernelsize value for jitter calculation (kernelSize = 3/5/7 for optimal use)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_jitter_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32u *kernelSizeTensor, Rpp32u seed, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Jitter augmentation on HIP backend for a NCHW/NHWC layout tensor
* \details The jitter augmentation adds a jitter effect for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.png Sample Input
* \image html effects_augmentations_jitter_img150x150.png Sample Output
* \param [in] srcPtr source tensor in HIP memory
* \param un[in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HIP memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] kernelSizeTensor kernelsize value for jitter calculation (kernelSize = 3/5/7 for optimal use)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_jitter_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32u *kernelSizeTensor, Rpp32u seed, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Gaussian noise augmentation on HOST backend
* \details This function adds gaussian noise to a batch of 4D tensors.
* Support added for u8 -> u8, f32 -> f32 datatypes.
Expand Down
19 changes: 19 additions & 0 deletions src/include/cpu/rpp_cpu_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6111,6 +6111,25 @@ inline void compute_separable_horizontal_resample(Rpp32f *inputPtr, T *outputPtr
}
}

inline void compute_jitter_src_loc_avx(__m256i *pxXorwowStateX, __m256i *pxXorwowStateCounter, __m256 &pRow, __m256 &pCol, __m256 &pKernelSize, __m256 &pBound, __m256 &pHeightLimit, __m256 &pWidthLimit, __m256 &pStride, __m256 &pChannel, Rpp32s *srcLoc)
{
__m256 pRngX = rpp_host_rng_xorwow_8_f32_avx(pxXorwowStateX, pxXorwowStateCounter);
__m256 pRngY = rpp_host_rng_xorwow_8_f32_avx(pxXorwowStateX, pxXorwowStateCounter);
__m256 pX = _mm256_mul_ps(pRngX, pKernelSize);
__m256 pY = _mm256_mul_ps(pRngY, pKernelSize);
pX = _mm256_max_ps(_mm256_min_ps(_mm256_floor_ps(_mm256_add_ps(pRow, _mm256_sub_ps(pX, pBound))), pHeightLimit), avx_p0);
pY = _mm256_max_ps(_mm256_min_ps(_mm256_floor_ps(_mm256_add_ps(pCol, _mm256_sub_ps(pY, pBound))), pWidthLimit), avx_p0);
__m256i pxSrcLoc = _mm256_cvtps_epi32(_mm256_fmadd_ps(pX, pStride, _mm256_mul_ps(pY, pChannel)));
_mm256_storeu_si256((__m256i*) srcLoc, pxSrcLoc);
}

inline void compute_jitter_src_loc(RpptXorwowStateBoxMuller *xorwowState, Rpp32s row, Rpp32s col, Rpp32s kSize, Rpp32s heightLimit, Rpp32s widthLimit, Rpp32s stride, Rpp32s bound, Rpp32s channels, Rpp32s &loc)
{
Rpp32u heightIncrement = rpp_host_rng_xorwow_f32(xorwowState) * kSize;
Rpp32u widthIncrement = rpp_host_rng_xorwow_f32(xorwowState) * kSize;
loc = std::max(std::min(static_cast<int>(row + heightIncrement - bound), heightLimit), 0) * stride;
loc += std::max(std::min(static_cast<int>(col + widthIncrement - bound), (widthLimit - 1)), 0) * channels;
}
inline void compute_sum_16_host(__m256i *p, __m256i *pSum)
{
pSum[0] = _mm256_add_epi32(_mm256_add_epi32(p[0], p[1]), pSum[0]); //add 16 values to 8
Expand Down
76 changes: 76 additions & 0 deletions src/include/cpu/rpp_cpu_simd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3859,6 +3859,20 @@ inline void rpp_resize_nn_load_u8pkd3(Rpp8u *srcRowPtrsForInterp, Rpp32s *loc, _
p = _mm_shuffle_epi8(px[0], xmm_pkd_mask); // Shuffle to obtain 4 RGB [R01|G01|B01|R11|G11|B11|R21|G21|B21|R31|G31|B31|00|00|00|00]
}

template<typename T>
inline void rpp_resize_nn_extract_pkd3_avx(T *srcRowPtrsForInterp, Rpp32s *loc, __m256i &p)
{
p = _mm256_setr_epi8(*(srcRowPtrsForInterp + loc[0]), *(srcRowPtrsForInterp + loc[0] + 1), *(srcRowPtrsForInterp + loc[0] + 2),
*(srcRowPtrsForInterp + loc[1]), *(srcRowPtrsForInterp + loc[1] + 1), *(srcRowPtrsForInterp + loc[1] + 2),
*(srcRowPtrsForInterp + loc[2]), *(srcRowPtrsForInterp + loc[2] + 1), *(srcRowPtrsForInterp + loc[2] + 2),
*(srcRowPtrsForInterp + loc[3]), *(srcRowPtrsForInterp + loc[3] + 1), *(srcRowPtrsForInterp + loc[3] + 2),
*(srcRowPtrsForInterp + loc[4]), *(srcRowPtrsForInterp + loc[4] + 1), *(srcRowPtrsForInterp + loc[4] + 2),
*(srcRowPtrsForInterp + loc[5]), *(srcRowPtrsForInterp + loc[5] + 1), *(srcRowPtrsForInterp + loc[5] + 2),
*(srcRowPtrsForInterp + loc[6]), *(srcRowPtrsForInterp + loc[6] + 1), *(srcRowPtrsForInterp + loc[6] + 2),
*(srcRowPtrsForInterp + loc[7]), *(srcRowPtrsForInterp + loc[7] + 1), *(srcRowPtrsForInterp + loc[7] + 2),
0, 0, 0, 0, 0, 0, 0, 0);
}

inline void rpp_resize_nn_load_u8pln1(Rpp8u *srcRowPtrsForInterp, Rpp32s *loc, __m128i &p)
{
__m128i px[4];
Expand All @@ -3871,6 +3885,16 @@ inline void rpp_resize_nn_load_u8pln1(Rpp8u *srcRowPtrsForInterp, Rpp32s *loc, _
p = _mm_unpacklo_epi8(px[0], px[1]); // unpack to obtain [R01|R11|R21|R31|00|00|00|00|00|00|00|00|00|00|00|00]
}

template<typename T>
inline void rpp_resize_nn_extract_pln1_avx(T *srcRowPtrsForInterp, Rpp32s *loc, __m256i &p)
{
p = _mm256_setr_epi8(*(srcRowPtrsForInterp + loc[0]), *(srcRowPtrsForInterp + loc[1]),
*(srcRowPtrsForInterp + loc[2]), *(srcRowPtrsForInterp + loc[3]),
*(srcRowPtrsForInterp + loc[4]), *(srcRowPtrsForInterp + loc[5]),
*(srcRowPtrsForInterp + loc[6]), *(srcRowPtrsForInterp + loc[7]),
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
}

inline void rpp_resize_nn_load_f32pkd3_to_f32pln3(Rpp32f *srcRowPtrsForInterp, Rpp32s *loc, __m128 *p)
{
p[0] = _mm_loadu_ps(srcRowPtrsForInterp + loc[0]); // LOC0 load [R01|G01|B01|R02] - Need RGB 01
Expand All @@ -3880,6 +3904,42 @@ inline void rpp_resize_nn_load_f32pkd3_to_f32pln3(Rpp32f *srcRowPtrsForInterp, R
_MM_TRANSPOSE4_PS(p[0], p[1], p[2], pTemp); // Transpose to obtain RGB in each vector
}

inline void rpp_resize_nn_load_f32pkd3_to_f32pln3_avx(Rpp32f *srcRowPtrsForInterp, Rpp32s *loc, __m256 *p)
{
__m128 p128[8];
p128[0] = _mm_loadu_ps(srcRowPtrsForInterp + loc[0]);
p128[1] = _mm_loadu_ps(srcRowPtrsForInterp + loc[1]);
p128[2] = _mm_loadu_ps(srcRowPtrsForInterp + loc[2]);
p128[3] = _mm_loadu_ps(srcRowPtrsForInterp + loc[3]);
_MM_TRANSPOSE4_PS(p128[0], p128[1], p128[2], p128[3]);
p128[4] = _mm_loadu_ps(srcRowPtrsForInterp + loc[4]);
p128[5] = _mm_loadu_ps(srcRowPtrsForInterp + loc[5]);
p128[6] = _mm_loadu_ps(srcRowPtrsForInterp + loc[6]);
p128[7] = _mm_loadu_ps(srcRowPtrsForInterp + loc[7]);
_MM_TRANSPOSE4_PS(p128[4], p128[5], p128[6], p128[7]);
p[0] = _mm256_setr_m128(p128[0], p128[4]);
p[1] = _mm256_setr_m128(p128[1], p128[5]);
p[2] = _mm256_setr_m128(p128[2], p128[6]);
}

inline void rpp_resize_nn_load_f16pkd3_to_f32pln3_avx(Rpp16f *srcRowPtrsForInterp, Rpp32s *loc, __m256 *p)
{
p[0] = _mm256_setr_ps((Rpp32f)*(srcRowPtrsForInterp + loc[0]), (Rpp32f)*(srcRowPtrsForInterp + loc[1]),
(Rpp32f)*(srcRowPtrsForInterp + loc[2]), (Rpp32f)*(srcRowPtrsForInterp + loc[3]),
(Rpp32f)*(srcRowPtrsForInterp + loc[4]), (Rpp32f)*(srcRowPtrsForInterp + loc[5]),
(Rpp32f)*(srcRowPtrsForInterp + loc[6]), (Rpp32f)*(srcRowPtrsForInterp + loc[7]));

p[1] = _mm256_setr_ps((Rpp32f)*(srcRowPtrsForInterp + loc[0] + 1), (Rpp32f)*(srcRowPtrsForInterp + loc[1] + 1),
(Rpp32f)*(srcRowPtrsForInterp + loc[2] + 1), (Rpp32f)*(srcRowPtrsForInterp + loc[3] + 1),
(Rpp32f)*(srcRowPtrsForInterp + loc[4] + 1), (Rpp32f)*(srcRowPtrsForInterp + loc[5] + 1),
(Rpp32f)*(srcRowPtrsForInterp + loc[6] + 1), (Rpp32f)*(srcRowPtrsForInterp + loc[7] + 1));

p[2] = _mm256_setr_ps((Rpp32f)*(srcRowPtrsForInterp + loc[0] + 2), (Rpp32f)*(srcRowPtrsForInterp + loc[1] + 2),
(Rpp32f)*(srcRowPtrsForInterp + loc[2] + 2), (Rpp32f)*(srcRowPtrsForInterp + loc[3] + 2),
(Rpp32f)*(srcRowPtrsForInterp + loc[4] + 2), (Rpp32f)*(srcRowPtrsForInterp + loc[5] + 2),
(Rpp32f)*(srcRowPtrsForInterp + loc[6] + 2), (Rpp32f)*(srcRowPtrsForInterp + loc[7] + 2));
}

inline void rpp_resize_nn_load_f32pln1(Rpp32f *srcRowPtrsForInterp, Rpp32s *loc, __m128 &p)
{
__m128 pTemp[4];
Expand All @@ -3892,6 +3952,22 @@ inline void rpp_resize_nn_load_f32pln1(Rpp32f *srcRowPtrsForInterp, Rpp32s *loc,
p = _mm_unpacklo_ps(pTemp[0], pTemp[1]); // Unpack to obtain [R01|R11|R21|R31]
}

inline void rpp_resize_nn_load_f32pln1_avx(Rpp32f *srcRowPtrsForInterp, Rpp32s *loc, __m256 &p)
{
p = _mm256_setr_ps(*(srcRowPtrsForInterp + loc[0]), *(srcRowPtrsForInterp + loc[1]),
*(srcRowPtrsForInterp + loc[2]), *(srcRowPtrsForInterp + loc[3]),
*(srcRowPtrsForInterp + loc[4]), *(srcRowPtrsForInterp + loc[5]),
*(srcRowPtrsForInterp + loc[6]), *(srcRowPtrsForInterp + loc[7]));
}

inline void rpp_resize_nn_load_f16pln1_avx(Rpp16f *srcRowPtrsForInterp, Rpp32s *loc, __m256 &p)
{
p = _mm256_setr_ps((Rpp32f)*(srcRowPtrsForInterp + loc[0]), (Rpp32f)*(srcRowPtrsForInterp + loc[1]),
(Rpp32f)*(srcRowPtrsForInterp + loc[2]), (Rpp32f)*(srcRowPtrsForInterp + loc[3]),
(Rpp32f)*(srcRowPtrsForInterp + loc[4]), (Rpp32f)*(srcRowPtrsForInterp + loc[5]),
(Rpp32f)*(srcRowPtrsForInterp + loc[6]), (Rpp32f)*(srcRowPtrsForInterp + loc[7]));
}

inline void rpp_resize_nn_load_i8pkd3(Rpp8s *srcRowPtrsForInterp, Rpp32s *loc, __m128i &p)
{
__m128i px[4];
Expand Down
3 changes: 2 additions & 1 deletion src/include/hip/rpp_hip_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1944,7 +1944,8 @@ __device__ __forceinline__ float rpp_hip_rng_xorwow_f32(T *xorwowState)
return outFloat - 1; // return 0 <= outFloat < 1
}

__device__ __forceinline__ void rpp_hip_rng_8_xorwow_f32(RpptXorwowState *xorwowState, d_float8 *randomNumbersPtr_f8)
template<typename T>
__device__ __forceinline__ void rpp_hip_rng_8_xorwow_f32(T *xorwowState, d_float8 *randomNumbersPtr_f8)
{
randomNumbersPtr_f8->f1[0] = rpp_hip_rng_xorwow_f32(xorwowState);
randomNumbersPtr_f8->f1[1] = rpp_hip_rng_xorwow_f32(xorwowState);
Expand Down
1 change: 1 addition & 0 deletions src/modules/cpu/host_tensor_effects_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ SOFTWARE.
#include "kernel/noise_shot.hpp"
#include "kernel/noise_gaussian.hpp"
#include "kernel/non_linear_blend.hpp"
#include "kernel/jitter.hpp"
#include "kernel/glitch.hpp"
#include "kernel/water.hpp"
#include "kernel/ricap.hpp"
Expand Down
Loading