You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During decoding, decode_avx2 returns an 32-byte value along with a 32-bit mask indicating which bytes are valid (i.e. not decoded from whitespace). Currently, these are packed using a simple loop over the bytes. There are likely more efficient ways to do this. (On AVX-512, you'd use the VPCOMPRESSB instruction, but that's not available here)
The text was updated successfully, but these errors were encountered:
You can emulate VPCOMPRESS with PSHUFB and a lookup table of shuffle control masks indexed by the bitmask.
Here's a library for that purpose https://github.com/lemire/simdprune/. It includes methods for various memory needs (a 16-bit mask means a 1 MiB table if unoptimized)
During decoding,
decode_avx2
returns an 32-byte value along with a 32-bit mask indicating which bytes are valid (i.e. not decoded from whitespace). Currently, these are packed using a simple loop over the bytes. There are likely more efficient ways to do this. (On AVX-512, you'd use the VPCOMPRESSB instruction, but that's not available here)The text was updated successfully, but these errors were encountered: