-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Resize Bilinear AVX512 Trial #128
base: master
Are you sure you want to change the base?
WIP - Resize Bilinear AVX512 Trial #128
Conversation
Includes optimization of - PLN load - PKD store
src/include/cpu/rpp_cpu_simd.hpp
Outdated
{ | ||
__m512 px[5]; | ||
__m512i shuffle = _mm512_set_epi32(15,11,7,3,14,10,6,2,13,9,5,1,12,8,4,0); | ||
__m512i index = _mm512_set_epi32(15,11,7,3,14,13,12,10,9,8,6,5,4,2,1,0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces after commas
src/include/cpu/rpp_cpu_simd.hpp
Outdated
__m512i index = _mm512_set_epi32(15,11,7,3,14,13,12,10,9,8,6,5,4,2,1,0); | ||
p[0] = _mm512_permutexvar_ps(shuffle, p[0]); | ||
p[1] = _mm512_permutexvar_ps(shuffle, p[1]); | ||
p[2] = _mm512_permutexvar_ps(shuffle, p[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets increase readability a bit here with - end of line comments like in many other SSE/AVX helpers, and using better variable names than shuffle and index for any reader to understand what shuffle and index are. (arrangements in pln3 or pkd3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah added these for few newly added routines.
src/include/cpu/rpp_cpu_simd.hpp
Outdated
__m512i indices1 = _mm512_set_epi32(loc[15],loc[14],loc[13],loc[12],loc[11],loc[10],loc[9],loc[8], | ||
loc[7],loc[6],loc[5],loc[4],loc[3],loc[2],loc[1],loc[0]); | ||
p[0] = _mm512_i32gather_ps(indices1, srcRowPtrsForInterp[0], sizeof(int32_t)); | ||
__m512i indices2 = _mm512_set_epi32(loc[15]+1,loc[14]+1,loc[13]+1,loc[12]+1,loc[11]+1,loc[10]+1,loc[9]+1,loc[8]+1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces before and after operataors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed these
loc[7]+1,loc[6]+1,loc[5]+1,loc[4]+1,loc[3]+1,loc[2]+1,loc[1]+1,loc[0]+1); | ||
p[1] = _mm512_i32gather_ps(indices2, srcRowPtrsForInterp[0], sizeof(int32_t)); | ||
p[2] = _mm512_i32gather_ps(indices1, srcRowPtrsForInterp[1], sizeof(int32_t)); | ||
p[3] = _mm512_i32gather_ps(indices2, srcRowPtrsForInterp[1], sizeof(int32_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need these gathers? they are pretty expensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gathers are expensive but overall gave slight improvements when compared with load, transpose and insert.
Includes AVX512 optimizations for u8 datatype for all layout variants.
Doesn't include new testsuite change.