WIP - Resize Bilinear AVX512 Trial #128

snehaa8 · 2023-05-25T08:27:55Z

Includes AVX512 optimizations for u8 datatype for all layout variants.

Doesn't include new testsuite change.

Includes optimization of - PLN load - PKD store

r-abishek · 2023-06-01T01:44:25Z

src/include/cpu/rpp_cpu_simd.hpp

+{
+    __m512 px[5];
+    __m512i shuffle = _mm512_set_epi32(15,11,7,3,14,10,6,2,13,9,5,1,12,8,4,0);
+    __m512i index = _mm512_set_epi32(15,11,7,3,14,13,12,10,9,8,6,5,4,2,1,0);


Spaces after commas

r-abishek · 2023-06-01T01:49:12Z

src/include/cpu/rpp_cpu_simd.hpp

+    __m512i index = _mm512_set_epi32(15,11,7,3,14,13,12,10,9,8,6,5,4,2,1,0);
+    p[0] = _mm512_permutexvar_ps(shuffle, p[0]);
+    p[1] = _mm512_permutexvar_ps(shuffle, p[1]);
+    p[2] = _mm512_permutexvar_ps(shuffle, p[2]);


Lets increase readability a bit here with - end of line comments like in many other SSE/AVX helpers, and using better variable names than shuffle and index for any reader to understand what shuffle and index are. (arrangements in pln3 or pkd3)

Yeah added these for few newly added routines.

r-abishek · 2023-06-01T02:29:01Z

src/include/cpu/rpp_cpu_simd.hpp

+    __m512i indices1 = _mm512_set_epi32(loc[15],loc[14],loc[13],loc[12],loc[11],loc[10],loc[9],loc[8],
+                                       loc[7],loc[6],loc[5],loc[4],loc[3],loc[2],loc[1],loc[0]);
+    p[0] = _mm512_i32gather_ps(indices1, srcRowPtrsForInterp[0], sizeof(int32_t));
+    __m512i indices2 = _mm512_set_epi32(loc[15]+1,loc[14]+1,loc[13]+1,loc[12]+1,loc[11]+1,loc[10]+1,loc[9]+1,loc[8]+1,


spaces before and after operataors

Fixed these

r-abishek · 2023-06-01T02:29:40Z

src/include/cpu/rpp_cpu_simd.hpp

+                                       loc[7]+1,loc[6]+1,loc[5]+1,loc[4]+1,loc[3]+1,loc[2]+1,loc[1]+1,loc[0]+1);
+    p[1] = _mm512_i32gather_ps(indices2, srcRowPtrsForInterp[0], sizeof(int32_t));
+    p[2] = _mm512_i32gather_ps(indices1, srcRowPtrsForInterp[1], sizeof(int32_t));
+    p[3] = _mm512_i32gather_ps(indices2, srcRowPtrsForInterp[1], sizeof(int32_t));


Do we really need these gathers? they are pretty expensive

Gathers are expensive but overall gave slight improvements when compared with load, transpose and insert.

snehaa8 added 10 commits May 16, 2023 11:35

Initial commit - PLN1 u8 resize bilinear AVX512

b159825

Fix increment issue for AVX512

b003781

Fix PLN variants for U8

c34e8ab

Implement PKD3->PKD3 for U8

ece459f

Optimize AVX512 PLN non toggle variant

421935e

Implement toggle variants for U8

3367c33

Optimized store routine for U8 PKD3

b71d387

Fix output issue with F32 PLN3->PKD3

0509a0e

Optimize F32 loads and stores

daef537

Includes optimization of - PLN load - PKD store

Optimize U8 PKD3 store

f0cb55e

r-abishek reviewed Jun 1, 2023

View reviewed changes

snehaa8 added 5 commits June 2, 2023 17:33

Add comments and fix spacing

04f28d4

Implement F16 datatype variants

08e1cd0

Implement I8 datatype variants

e633db1

Optimization with set instructions for load

74b2881

Revert using setr intructions for pkd3 load

aaf5e37

r-abishek changed the title ~~Resize Bilinear avx512~~ WIP - Resize Bilinear AVX512 Trial Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Resize Bilinear AVX512 Trial #128

WIP - Resize Bilinear AVX512 Trial #128

snehaa8 commented May 25, 2023

r-abishek Jun 1, 2023

r-abishek Jun 1, 2023

snehaa8 Jun 6, 2023

r-abishek Jun 1, 2023

snehaa8 Jun 6, 2023

r-abishek Jun 1, 2023

snehaa8 Jun 6, 2023

WIP - Resize Bilinear AVX512 Trial #128

Are you sure you want to change the base?

WIP - Resize Bilinear AVX512 Trial #128

Conversation

snehaa8 commented May 25, 2023

r-abishek Jun 1, 2023

Choose a reason for hiding this comment

r-abishek Jun 1, 2023

Choose a reason for hiding this comment

snehaa8 Jun 6, 2023

Choose a reason for hiding this comment

r-abishek Jun 1, 2023

Choose a reason for hiding this comment

snehaa8 Jun 6, 2023

Choose a reason for hiding this comment

r-abishek Jun 1, 2023

Choose a reason for hiding this comment

snehaa8 Jun 6, 2023

Choose a reason for hiding this comment