[avx2] Optimize byte packing during decoding #2

jethrogb · 2020-04-22T13:02:08Z

During decoding, decode_avx2 returns an 32-byte value along with a 32-bit mask indicating which bytes are valid (i.e. not decoded from whitespace). Currently, these are packed using a simple loop over the bytes. There are likely more efficient ways to do this. (On AVX-512, you'd use the VPCOMPRESSB instruction, but that's not available here)

The text was updated successfully, but these errors were encountered:

TheIronBorn · 2021-10-02T23:19:14Z

You can emulate VPCOMPRESS with PSHUFB and a lookup table of shuffle control masks indexed by the bitmask.

Here's a library for that purpose https://github.com/lemire/simdprune/. It includes methods for various memory needs (a 16-bit mask means a 1 MiB table if unoptimized)

jethrogb · 2021-10-04T21:30:45Z

That looks like an good direction to explore. Note that PSHUFB works on 16 bytes so you'd still need to re-pack the output of 2 PSHUFB calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[avx2] Optimize byte packing during decoding #2

[avx2] Optimize byte packing during decoding #2

jethrogb commented Apr 22, 2020

TheIronBorn commented Oct 2, 2021

jethrogb commented Oct 4, 2021

[avx2] Optimize byte packing during decoding #2

[avx2] Optimize byte packing during decoding #2

Comments

jethrogb commented Apr 22, 2020

TheIronBorn commented Oct 2, 2021

jethrogb commented Oct 4, 2021