Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Lens Correction on HOST and HIP #382

Merged
merged 126 commits into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from 118 commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
eddf955
Add Remap Tensor HOST and HIP implementation
HazarathKumarM Jan 17, 2024
c893816
Add testsuite support
snehaa8 Jan 17, 2024
2527978
Fix non layout toggle PKD3 HOST variants
snehaa8 Jan 18, 2024
ca0fde5
Update check condition while comparing QA outputs to support remap
snehaa8 Jan 18, 2024
408f2dd
Fixing naming of HIP tensor as per latest format
snehaa8 Jan 19, 2024
a1f4213
License - updates to 2024 and consistency changes (#298)
r-abishek Jan 31, 2024
7096c1d
Test - Update README.md for test_suite (#299)
r-abishek Jan 31, 2024
07a5f66
Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sph…
dependabot[bot] Feb 6, 2024
e93c591
Fix mismatch between HIP and HOST
snehaa8 Feb 7, 2024
a5e5679
Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sph…
dependabot[bot] Feb 7, 2024
30a6fa0
Change typecast into reinterpret_cast
snehaa8 Feb 8, 2024
3c4b6a4
Modify PLN3 reference output
snehaa8 Feb 8, 2024
0c46d84
Merge branch 'master' of https://github.com/ROCm/rpp into sn/remap
snehaa8 Feb 8, 2024
40073fa
Merge branch 'develop' of https://github.com/ROCm/rpp into sn/remap
snehaa8 Feb 8, 2024
43ed4d5
Update reference outputs
snehaa8 Feb 8, 2024
e8aa6b2
Update doc codeowners (#303)
samjwu Feb 8, 2024
a921332
Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.3…
dependabot[bot] Feb 9, 2024
54d16d1
Fix PLN3 output corruption of remap
snehaa8 Feb 9, 2024
0f83b1d
Cleanup comments
snehaa8 Feb 9, 2024
30bed4e
Test suite - upgrade 5 qa perf (#305)
kiritigowda Feb 9, 2024
5c423ab
RPP Color Temperature on HOST and HIP (#271)
r-abishek Feb 9, 2024
df6e2c9
RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)
r-abishek Feb 9, 2024
61f56e7
Merge branch 'develop' into sn/remap
r-abishek Feb 13, 2024
a4ed137
RPP Magnitude on HOST and HIP (#278)
r-abishek Feb 14, 2024
1976cbf
Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sph…
dependabot[bot] Feb 16, 2024
ec8f2f0
RPP Tensor Audio Support - Down Mixing (#296)
r-abishek Feb 16, 2024
29a5c82
RPP Voxel 3D Tensor Multiply scalar on HOST and HIP (#306)
r-abishek Feb 16, 2024
98a3c82
Test Suite Bugfix (#307)
r-abishek Feb 16, 2024
c455555
Revert commit "Fix mismatch between HIP and HOST"
snehaa8 Feb 20, 2024
c750beb
Merge branch 'sn/remap' of https://github.com/snehaa8/rpp into sn/remap
snehaa8 Feb 20, 2024
3a5579b
Modify roi input format
snehaa8 Feb 22, 2024
a5937b0
Merge branch 'develop' into sn/remap
snehaa8 Feb 22, 2024
cfebb41
Modify HOST to use numThreads for openMP parallelization
snehaa8 Feb 22, 2024
b69456e
Add tableDescPtr into doxygen docs
snehaa8 Feb 22, 2024
a6a0fd3
Initial commit - Lens correction HOST and HIP support
snehaa8 Feb 23, 2024
608225b
Bump rocm-docs-core[api_reference] from 0.34.2 to 0.35.0 in /docs/sph…
dependabot[bot] Feb 23, 2024
a7ef385
RPP Reduction - Tensor min and Tensor max on HOST and HIP (#260)
r-abishek Feb 24, 2024
473cde4
CI - Update precheckin.groovy
kiritigowda Feb 24, 2024
6c0e0fc
Modify HIP as per latest format
snehaa8 Feb 27, 2024
ceab8c4
Fix corruption in HIP outputs
snehaa8 Feb 28, 2024
4bacfed
Merge branch 'develop' into sn/lens_correction
snehaa8 Feb 28, 2024
eb82c08
Cleanup
snehaa8 Feb 28, 2024
6d34304
Merge branch 'sn/lens_correction' of https://github.com/snehaa8/rpp i…
snehaa8 Feb 28, 2024
368c0d5
Cleanup
snehaa8 Feb 29, 2024
91e56fa
Merge branch 'develop' into sn/remap
snehaa8 Feb 29, 2024
778ae0a
Merge branch 'sn/remap' of https://github.com/snehaa8/rpp into sn/len…
snehaa8 Feb 29, 2024
83ac929
Cleanup
snehaa8 Feb 29, 2024
c33af22
Bump rocm-docs-core[api_reference] from 0.35.0 to 0.35.1 in /docs/sph…
dependabot[bot] Mar 6, 2024
14f6334
Bump rocm-docs-core[api_reference] from 0.35.1 to 0.36.0 in /docs/sph…
dependabot[bot] Mar 12, 2024
95c3272
Merge branch 'master' into develop
kiritigowda Mar 12, 2024
641f653
Docs - Bump rocm-docs-core[api_reference] from 0.36.0 to 0.37.0 in /d…
dependabot[bot] Mar 20, 2024
5568573
Link cleanup (#326)
LisaDelaney Mar 20, 2024
a6749ba
Update notes
LisaDelaney Mar 20, 2024
a255906
Docs - Bump rocm-docs-core[api_reference] from 0.37.0 to 0.37.1 in /d…
dependabot[bot] Mar 22, 2024
d3df761
RPP Voxel Flip on HIP and HOST (#285)
r-abishek Mar 23, 2024
ebecb42
RPP Vignette Tensor on HOST and HIP (#311)
r-abishek Mar 23, 2024
f9519be
added missing outputs for image augmentations
sampath1117 Mar 25, 2024
d309411
added gif for voxel input and outputs
sampath1117 Mar 25, 2024
918a297
Merge branch 'master' into sn/remap
snehaa8 Mar 26, 2024
46467c8
modified the output images for water, resize_crop_mirror and resize_m…
sampath1117 Mar 26, 2024
e992ba0
Merge branch 'master' into sn/lens_correction
snehaa8 Mar 26, 2024
2f6ba34
Merge pull request #253 from sampath1117/sr/doxygen_outputs
r-abishek Mar 27, 2024
0f95723
Merge branch 'ar/doxygen_update_4' of https://github.com/r-abishek/rp…
snehaa8 Mar 28, 2024
58c6b1b
Add doc outputs for remap
snehaa8 Mar 28, 2024
1147bfe
Update CMakeLists.txt
kiritigowda Apr 12, 2024
352fb22
Merge branch 'develop' into sn/remap
snehaa8 Apr 16, 2024
8927da7
Merge branch 'develop' into sn/lens_correction
snehaa8 Apr 16, 2024
bd6a6c3
Address review comments
snehaa8 Apr 18, 2024
2b80df8
Revert changes in common file
snehaa8 Apr 18, 2024
5e3fc7a
Bump rocm-docs-core[api_reference] from 0.38.1 to 1.0.0 in /docs/sphi…
dependabot[bot] Apr 18, 2024
1bcb8dc
Merge branch 'sn/remap' into sn/lens_correction
snehaa8 Apr 22, 2024
4438fe2
Modify buffer used in HIP as per latest changes
snehaa8 Apr 22, 2024
8f7b2d5
Modify location of input images
snehaa8 Apr 23, 2024
75b8a3a
Add doxygen outputs
snehaa8 Apr 23, 2024
6c4e179
Fix doxygen comments
snehaa8 Apr 23, 2024
4536723
Fix datatype of remap tables in doxygen comments
snehaa8 Apr 23, 2024
50435f1
Merge pull request #233 from snehaa8/sn/remap
r-abishek Apr 23, 2024
b74a4dc
Revert removal of nearbyintf
snehaa8 Apr 24, 2024
b96a7e3
Cleanup
snehaa8 Apr 24, 2024
860749c
Merge pull request #266 from snehaa8/sn/remap
r-abishek Apr 25, 2024
77e14ef
Minor common-fixes for HIP (#345)
r-abishek May 7, 2024
e365141
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_remap
r-abishek May 7, 2024
49e12b1
merge fix
r-abishek May 7, 2024
34f3f6d
Readme Updates: --usecase=rocm (#349)
kiritigowda May 8, 2024
ab52683
RPP Tensor Audio Support - Spectrogram (#312)
r-abishek May 8, 2024
ee0d6fe
Update CHANGELOG.md (#352)
r-abishek May 8, 2024
2decd32
RPP Tensor Audio Support - Slice (#325)
r-abishek May 8, 2024
30ce1d6
RPP Tensor Audio Support - MelFilterBank (#332)
r-abishek May 8, 2024
64ae74f
RPP Tensor Normalize ND on HOST and HIP (#335)
r-abishek May 9, 2024
1a3015c
SWDEV-459739 - Remove the package obsolete setting (#353)
raramakr May 9, 2024
4cb8d4b
Audio support merge commit fixes (#354)
r-abishek May 9, 2024
e7b6f9b
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_remap
r-abishek May 14, 2024
c8ce20a
Merge branch 'develop' into sn/lens_correction
sampath1117 May 15, 2024
6cf0c23
remove d_float14 union and device synchronize()
sampath1117 May 15, 2024
bbd8fcc
removed unncessary change added for d_float9 union
sampath1117 May 15, 2024
1a2a582
fix build issue with HIP test suite
sampath1117 May 16, 2024
96f7020
vectorized the remap tables compute kernel
sampath1117 May 16, 2024
d4b6a41
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_remap
r-abishek May 28, 2024
e6d04f5
Merge branch 'develop' into ar/opt_remap
kiritigowda May 29, 2024
a8ede4d
removed unnecesesary inline helper functions
sampath1117 May 31, 2024
69aed52
Merge pull request #275 from sampath1117/remap_pr_changes
r-abishek May 31, 2024
d02a4c3
removed further unncessary inline helpers
sampath1117 Jun 4, 2024
14d5c8f
Merge pull request #276 from sampath1117/sr/remap_changes
r-abishek Jun 4, 2024
6eea0c6
Merge branch 'develop' into ar/opt_remap
r-abishek Jun 4, 2024
ac6b0c7
Update rppt_tensor_geometric_augmentations.h
r-abishek Jun 4, 2024
fcb1939
Update remap.hpp
r-abishek Jun 4, 2024
34ee032
Merge branch 'ar/opt_remap' of https://github.com/r-abishek/rpp into …
r-abishek Jun 5, 2024
6312343
modified comments for better readability
sampath1117 Jun 5, 2024
6d99d6d
Merge branch 'ar/opt_lens_correction' into sn/lens_correction
r-abishek Jun 6, 2024
ee72679
removed stream synchronize in hip kernel
sampath1117 Jun 11, 2024
2901667
consolidated the code using fmadd operations
sampath1117 Jun 17, 2024
28bd358
minor changes
sampath1117 Jun 17, 2024
a36a79a
Merge branch 'develop' into sn/lens_correction
sampath1117 Jun 17, 2024
a5952e4
Merge pull request #239 from snehaa8/sn/lens_correction
r-abishek Jun 17, 2024
620b9ca
Add init for lens_correction
r-abishek Jun 17, 2024
cdbfbde
Change params to pinned mem, and use init_lens_correction
r-abishek Jun 17, 2024
12b416d
Merge branch 'develop' into ar/opt_lens_correction
r-abishek Jun 24, 2024
08b14d6
Merge branch 'develop' into ar/opt_lens_correction
kiritigowda Jun 24, 2024
8bbc4b5
Merge branch 'develop' into sr/lens_correction_pr_changes
sampath1117 Jul 16, 2024
98ba67c
bux fix for help in test suite
sampath1117 Jul 16, 2024
fd61f24
initialized inverse matrix values to 0 before computing inverse
sampath1117 Jul 16, 2024
47c5170
Merge branch 'develop' into ar/opt_lens_correction
r-abishek Jul 16, 2024
9d75924
Merge branch 'ar/opt_lens_correction' of https://github.com/r-abishek…
r-abishek Jul 16, 2024
3c8eb28
Avoid showing transpose changes
r-abishek Jul 16, 2024
a61c03d
Merge pull request #291 from sampath1117/sr/lens_correction_pr_changes
r-abishek Jul 16, 2024
1695917
Merge branch 'develop' into ar/opt_lens_correction
kiritigowda Jul 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/data/doxygenInputs/lens_img640x480.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions include/rppt_tensor_geometric_augmentations.h
Original file line number Diff line number Diff line change
Expand Up @@ -634,6 +634,58 @@ RppStatus rppt_remap_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstP
RppStatus rppt_remap_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *rowRemapTable, Rpp32f *colRemapTable, RpptDescPtr tableDescPtr, RpptInterpolationType interpolationType, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Lens correction transformation on HOST backend for a NCHW/NHWC layout tensor
* \details Performs lens correction transforms on an image to compensate barrel lens distortion of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html lens_img640x480.png Sample Input
rrawther marked this conversation as resolved.
Show resolved Hide resolved
* \image html geometric_augmentations_lens_correction_img_640x480.png Sample Output
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] rowRemapTable Rpp32f row numbers in HOST memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] colRemapTable Rpp32f column numbers in HOST memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] tableDescPtr table tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = F32, layout = NHWC, c = 1)
* \param [in] cameraMatrixTensor contains camera intrinsic parameters required to compute lens corrected image. (1D tensor of size 9 * batchSize)
* \param [in] distortionCoeffsTensor contains distortion coefficients required to compute lens corrected image. (1D tensor of size 8 * batchSize)
* \param [in] roiTensorSrc ROI data in HOST memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
* \ingroup group_tensor_geometric
*/
RppStatus rppt_lens_correction_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *rowRemapTable, Rpp32f *colRemapTable, RpptDescPtr tableDescPtr, Rpp32f *cameraMatrixTensor, Rpp32f *distortionCoeffsTensor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Lens correction transformation on HIP backend for a NCHW/NHWC layout tensor
* \details Performs lens correction transforms on an image to compensate barrel lens distortion of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html lens_img640x480.png Sample Input
* \image html geometric_augmentations_lens_correction_img_640x480.png Sample Output
* \param [in] srcPtr source tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HIP memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] rowRemapTable Rpp32f row numbers in HIP memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] colRemapTable Rpp32f column numbers in HIP memory for every pixel in the input batch of images (1D tensor of size width * height * batchSize)
* \param [in] tableDescPtr table tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = F32, layout = NHWC, c = 1)
* \param [in] cameraMatrixTensor contains camera intrinsic parameters required to compute lens corrected image. (1D tensor of size 9 * batchSize)
* \param [in] distortionCoeffsTensor contains distortion coefficients required to compute lens corrected image. (1D tensor of size 8 * batchSize)
* \param [in] roiTensorSrc ROI data in HIP memory, for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
* \ingroup group_tensor_geometric
*/
RppStatus rppt_lens_correction_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *rowRemapTable, Rpp32f *colRemapTable, RpptDescPtr tableDescPtr, Rpp32f *cameraMatrixTensor, Rpp32f *distortionCoeffsTensor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! @}
*/

Expand Down
18 changes: 17 additions & 1 deletion src/include/hip/rpp_hip_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ typedef union { float f1[5];
typedef union { float f1[6]; float2 f2[3]; } d_float6;
typedef union { float f1[7]; } d_float7;
typedef union { float f1[8]; float2 f2[4]; float4 f4[2]; } d_float8;
typedef union { float f1[9]; } d_float9;
typedef union { float f1[9]; float3 f3[3]; } d_float9;
typedef union { float f1[12]; float4 f4[3]; } d_float12;
typedef union { float f1[16]; float4 f4[4]; d_float8 f8[2]; } d_float16;
typedef union { float f1[24]; float2 f2[12]; float3 f3[8]; float4 f4[6]; d_float8 f8[3]; } d_float24;
Expand Down Expand Up @@ -1776,6 +1776,22 @@ __device__ __forceinline__ void rpp_hip_math_multiply24_const(d_float24 *src_f24
dst_f24->f4[5] = src_f24->f4[5] * multiplier_f4;
}

// d_float8 divide

__device__ __forceinline__ void rpp_hip_math_divide8(d_float8 *src1Ptr_f8, d_float8 *src2Ptr_f8, d_float8 *dstPtr_f8)
{
dstPtr_f8->f4[0] = src1Ptr_f8->f4[0] / src2Ptr_f8->f4[0];
dstPtr_f8->f4[1] = src1Ptr_f8->f4[1] / src2Ptr_f8->f4[1];
}

// d_float8 divide with constant

__device__ __forceinline__ void rpp_hip_math_divide8_const(d_float8 *src_f8, d_float8 *dst_f8, float4 divisor_f4)
{
dst_f8->f4[0] = divisor_f4 / src_f8->f4[0];
dst_f8->f4[1] = divisor_f4 / src_f8->f4[1];
}

// d_float8 bitwiseAND

__device__ __forceinline__ void rpp_hip_math_bitwiseAnd8(d_float8 *src1_f8, d_float8 *src2_f8, d_float8 *dst_f8)
Expand Down
1 change: 1 addition & 0 deletions src/modules/cpu/host_tensor_geometric_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ SOFTWARE.
#include "kernel/warp_affine.hpp"
#include "kernel/phase.hpp"
#include "kernel/slice.hpp"
#include "kernel/lens_correction.hpp"
#include "kernel/crop_and_patch.hpp"
#include "kernel/flip_voxel.hpp"

Expand Down
177 changes: 177 additions & 0 deletions src/modules/cpu/kernel/lens_correction.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
/*
MIT License

Copyright (c) 2019 - 2024 Advanced Micro Devices, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
*/

#include "rppdefs.h"
#include "rpp_cpu_simd.hpp"
#include "rpp_cpu_common.hpp"
#include <omp.h>

// Compute Inverse matrix (3x3)
inline void get_inverse(float *mat, float *invMat)
{
float det = mat[0] * (mat[4] * mat[8] - mat[7] * mat[5]) - mat[1] * (mat[3] * mat[8] - mat[5] * mat[6]) + mat[2] * (mat[3] * mat[7] - mat[4] * mat[6]);
if(det != 0)
rrawther marked this conversation as resolved.
Show resolved Hide resolved
{
float invDet = 1 / det;
invMat[0] = (mat[4] * mat[8] - mat[7] * mat[5]) * invDet;
invMat[1] = (mat[2] * mat[7] - mat[1] * mat[8]) * invDet;
invMat[2] = (mat[1] * mat[5] - mat[2] * mat[4]) * invDet;
invMat[3] = (mat[5] * mat[6] - mat[3] * mat[8]) * invDet;
invMat[4] = (mat[0] * mat[8] - mat[2] * mat[6]) * invDet;
invMat[5] = (mat[3] * mat[2] - mat[0] * mat[5]) * invDet;
invMat[6] = (mat[3] * mat[7] - mat[6] * mat[4]) * invDet;
invMat[7] = (mat[6] * mat[1] - mat[0] * mat[7]) * invDet;
invMat[8] = (mat[0] * mat[4] - mat[3] * mat[1]) * invDet;
}
}

inline void compute_lens_correction_remap_tables_host_tensor(RpptDescPtr srcDescPtr,
Rpp32f *rowRemapTable,
Rpp32f *colRemapTable,
RpptDescPtr tableDescPtr,
Rpp32f *cameraMatrixTensor,
Rpp32f *distortionCoeffsTensor,
RpptROIPtr roiTensorPtrSrc,
rpp::Handle& handle)
{
Rpp32u numThreads = handle.GetNumThreads();
omp_set_dynamic(0);
#pragma omp parallel for num_threads(numThreads)
for(int batchCount = 0; batchCount < srcDescPtr->n; batchCount++)
{
Rpp32f *rowRemapTableTemp, *colRemapTableTemp;
rowRemapTableTemp = rowRemapTable + batchCount * tableDescPtr->strides.nStride;
colRemapTableTemp = colRemapTable + batchCount * tableDescPtr->strides.nStride;

// cameraMatrix is a 3x3 matrix thus increment by 9 to iterate from one tensor in a batch to another
Rpp32f *cameraMatrix = cameraMatrixTensor + batchCount * 9;
Rpp32f *distortionCoeffs = distortionCoeffsTensor + batchCount * 8;
Rpp32s height = roiTensorPtrSrc[batchCount].xywhROI.roiHeight;
Rpp32s width = roiTensorPtrSrc[batchCount].xywhROI.roiWidth;
Rpp32u alignedLength = width & ~7;
Rpp32s vectorIncrement = 8;

Rpp32f invCameraMatrix[9];
get_inverse(cameraMatrix, invCameraMatrix);
Rpp32f *invMat = &invCameraMatrix[0];

// Get radial and tangential distortion coefficients
Rpp32f rCoeff[6] = { distortionCoeffs[0], distortionCoeffs[1], distortionCoeffs[4], distortionCoeffs[5], distortionCoeffs[6], distortionCoeffs[7] };
Rpp32f tCoeff[2] = { distortionCoeffs[2], distortionCoeffs[3] };

__m256 pRCoeff[6], pTCoeff[2];
pRCoeff[0] = _mm256_set1_ps(rCoeff[0]);
pRCoeff[1] = _mm256_set1_ps(rCoeff[1]);
pRCoeff[2] = _mm256_set1_ps(rCoeff[2]);
pRCoeff[3] = _mm256_set1_ps(rCoeff[3]);
pRCoeff[4] = _mm256_set1_ps(rCoeff[4]);
pRCoeff[5] = _mm256_set1_ps(rCoeff[5]);
pTCoeff[0] = _mm256_set1_ps(tCoeff[0]);
pTCoeff[1] = _mm256_set1_ps(tCoeff[1]);

Rpp32f u0 = cameraMatrix[2], v0 = cameraMatrix[5];
Rpp32f fx = cameraMatrix[0], fy = cameraMatrix[4];
__m256 pFx, pFy, pU0, pV0;
pFx = _mm256_set1_ps(fx);
pFy = _mm256_set1_ps(fy);
pU0 = _mm256_set1_ps(u0);
pV0 = _mm256_set1_ps(v0);

__m256 pInvMat0, pInvMat3, pInvMat6;
pInvMat0 = _mm256_set1_ps(invMat[0]);
pInvMat3 = _mm256_set1_ps(invMat[3]);
pInvMat6 = _mm256_set1_ps(invMat[6]);

__m256 pXCameraInit, pYCameraInit, pZCameraInit;
__m256 pXCameraIncrement, pYCameraIncrement, pZCameraIncrement;
pXCameraInit = _mm256_mul_ps(avx_pDstLocInit, pInvMat0);
pYCameraInit = _mm256_mul_ps(avx_pDstLocInit, pInvMat3);
pZCameraInit = _mm256_mul_ps(avx_pDstLocInit, pInvMat6);
pXCameraIncrement = _mm256_mul_ps(pInvMat0, avx_p8);
pYCameraIncrement = _mm256_mul_ps(pInvMat3, avx_p8);
pZCameraIncrement = _mm256_mul_ps(pInvMat6, avx_p8);
for(int i = 0; i < height; i++)
{
Rpp32f *rowRemapTableRow = rowRemapTableTemp + i * tableDescPtr->strides.hStride;
Rpp32f *colRemapTableRow = colRemapTableTemp + i * tableDescPtr->strides.hStride;
Rpp32f xCamera = i * invMat[1] + invMat[2];
Rpp32f yCamera = i * invMat[4] + invMat[5];
Rpp32f zCamera = i * invMat[7] + invMat[8];
__m256 pXCamera = _mm256_add_ps(_mm256_set1_ps(xCamera), pXCameraInit);
__m256 pYCamera = _mm256_add_ps(_mm256_set1_ps(yCamera), pYCameraInit);
__m256 pZCamera = _mm256_add_ps(_mm256_set1_ps(zCamera), pZCameraInit);
int vectorLoopCount = 0;
for(; vectorLoopCount < alignedLength; vectorLoopCount += vectorIncrement)
{
// float z = 1./zCamera, x = xCamera*z, y = yCamera*z;
__m256 pZ = _mm256_div_ps(avx_p1, pZCamera);
__m256 pX = _mm256_mul_ps(pXCamera, pZ);
__m256 pY = _mm256_mul_ps(pYCamera, pZ);

// float xSquare = x*x, ySquare = y*y, r2 = xSquare + ySquare;
__m256 pXSquare = _mm256_mul_ps(pX, pX);
__m256 pYSquare = _mm256_mul_ps(pY, pY);
__m256 pR2 = _mm256_add_ps(pXSquare, pYSquare);

// float xyMul2 = 2*x*y;
__m256 p2xy = _mm256_mul_ps(avx_p2, _mm256_mul_ps(pX, pY));

// float kr = std::fmaf(std::fmaf(std::fmaf(rCoeff[2], r2, rCoeff[1]), r2, rCoeff[0]), r2, 1) / std::fmaf(std::fmaf(std::fmaf(rCoeff[5], r2, rCoeff[4]), r2, rCoeff[3]), r2, 1);
__m256 pNum = _mm256_fmadd_ps(_mm256_fmadd_ps(_mm256_fmadd_ps(pRCoeff[2], pR2, pRCoeff[1]), pR2, pRCoeff[0]), pR2, avx_p1);
__m256 pDen = _mm256_fmadd_ps(_mm256_fmadd_ps(_mm256_fmadd_ps(pRCoeff[5], pR2, pRCoeff[4]), pR2, pRCoeff[3]), pR2, avx_p1);
__m256 pKR = _mm256_div_ps(pNum, pDen);

// float colLoc = std::fmaf(fx, (std::fmaf(tCoeff[1], (std::fmaf(2, xSquare, r2)), std::fmaf(x, kr, (tCoeff[0] * xyMul2)))), u0);
__m256 pColLoc = _mm256_fmadd_ps(pFx, _mm256_fmadd_ps(pTCoeff[1], _mm256_fmadd_ps(avx_p2, pXSquare, pR2), _mm256_fmadd_ps(pX, pKR, _mm256_mul_ps(pTCoeff[0], p2xy))), pU0);

// float rowLoc = std::fmaf(fy, (std::fmaf(tCoeff[0], (std::fmaf(2, ySquare, r2)), std::fmaf(y, kr, (tCoeff[1] * xyMul2)))), v0);
__m256 pRowLoc = _mm256_fmadd_ps(pFy, _mm256_fmadd_ps(pTCoeff[0], _mm256_fmadd_ps(avx_p2, pYSquare, pR2), _mm256_fmadd_ps(pY, pKR, _mm256_mul_ps(pTCoeff[1], p2xy))), pV0);

_mm256_storeu_ps(rowRemapTableRow, pRowLoc);
_mm256_storeu_ps(colRemapTableRow, pColLoc);
rowRemapTableRow += vectorIncrement;
colRemapTableRow += vectorIncrement;

// xCamera += invMat[0], yCamera += invMat[3], zCamera += invMat[6]
pXCamera = _mm256_add_ps(pXCamera, pXCameraIncrement);
pYCamera = _mm256_add_ps(pYCamera, pYCameraIncrement);
pZCamera = _mm256_add_ps(pZCamera, pZCameraIncrement);
}
for(; vectorLoopCount < width; vectorLoopCount++)
{
Rpp32f z = 1./zCamera, x = xCamera * z, y = yCamera * z;
Rpp32f xSquare = x * x, ySquare = y * y, r2 = xSquare + ySquare;
Rpp32f xyMul2 = 2 * x * y;
Rpp32f kr = std::fmaf(std::fmaf(std::fmaf(rCoeff[2], r2, rCoeff[1]), r2, rCoeff[0]), r2, 1) / std::fmaf(std::fmaf(std::fmaf(rCoeff[5], r2, rCoeff[4]), r2, rCoeff[3]), r2, 1);
Rpp32f colLoc = std::fmaf(fx, (std::fmaf(tCoeff[1], (std::fmaf(2, xSquare, r2)), std::fmaf(x, kr, (tCoeff[0] * xyMul2)))), u0);
Rpp32f rowLoc = std::fmaf(fy, (std::fmaf(tCoeff[0], (std::fmaf(2, ySquare, r2)), std::fmaf(y, kr, (tCoeff[1] * xyMul2)))), v0);
*rowRemapTableRow++ = rowLoc;
*colRemapTableRow++ = colLoc;
xCamera += invMat[0];
yCamera += invMat[3];
zCamera += invMat[6];
}
}
}
}
1 change: 1 addition & 0 deletions src/modules/hip/hip_tensor_geometric_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ SOFTWARE.
#include "kernel/resize_crop_mirror.hpp"
#include "kernel/phase.hpp"
#include "kernel/slice.hpp"
#include "kernel/lens_correction.hpp"
#include "kernel/crop_and_patch.hpp"
#include "kernel/flip_voxel.hpp"

Expand Down
Loading