-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap the new SIMD module #255
Bootstrap the new SIMD module #255
Conversation
By using `_mm_set_epi16` we can set the value using a single instruction instead of three. With this change, we can get rid of the MMX instructions `_mm_setzero_si64` and `_mm_set_pi16`. Refs: scality#222
First, add some useful instruction set-specific definitions. Two types: - RegisterType: underlying representation of the future Register class. - MaskType: same as RegisterType, but for the Mask class. Three useful constants: - INSTRUCTION_SET: useful for discrimination. - ALIGNMENT: useful for memory allocation. - REG_BITSZ: useful for iteration. Refs: scality#222
This first test is pretty dumb, but it will be useful to bootstrap and setup the CircleCI pipeline. Refs: scality#222
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR. There are small things to change.
// We required AVX2 because we relies on some instructions (such as | ||
// `_mm256_add_epi16` and others) that aren't available in the first version of | ||
// AVX. | ||
#if defined(__AVX__) && defined(__AVX2__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is defined(__AVX2__)
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's enough but I prefer to be explicit: here we want AVX BUT because of QuadIron uses of SIMD we required AVX2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we could simplify it, also the comment as we need AVX2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I think explicit is better than implicit (and removing AVX doesn't really make the code more or less readable)
src/simd_nf4.h
Outdated
@@ -52,19 +54,18 @@ static inline aint128 m128i_to_uint128(m128i v) | |||
inline aint128 expand16(uint16_t* arr, int n) | |||
{ | |||
// since n <= 4 | |||
uint16_t _arr[4] __attribute__((aligned(ALIGN_SIZE))) = {0, 0, 0, 0}; | |||
uint16_t _arr[4] __attribute__((aligned(simd::ALIGNMENT))) = {0, 0, 0, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in simd
namespace so simd::
is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll fix this 😃
That way, it can be ran in parallel with the test suite. Refs: scality#251
7ce4d38
to
83e2bec
Compare
Now, we are sure that every codepath is covered and we won't silently break one of them. Refs: scality#222, scality#251
d141979
to
35ad3a8
Compare
With this allocator we can use the STL container, such as std::vector, and be sure that we can use aligned load instruction when necessary. Refs: scality#222
The new implementation is more C++-like (can be used with STL container) and more robust (overflow handling, strict aliasing rule, …). Refs: scality#222
35ad3a8
to
ba0ec36
Compare
This PR just lay the foundation for the new SIMD module, the biggest part will come in the next PR.
That being said, here we defines the basic constants and helpers, as well as a custom allocator.