Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD Extract MSbs Intrinsic #4466

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Barinzaya
Copy link
Contributor

This PR adds a simd_extract_msbs intrinsic, along with an extract_msbs re-export to core:simd. This intrinsic extracts the most significant bit of each element of a #simd vector and packs them into a bit_set. This behavior is similar to the SSE/AVX movemask intrinsic/movmsk instruction. This intrinsic is defined similarly to:

simd_extract_msbs :: proc(a: #simd[N]T) -> bit_set[0..<N] where type_is_integer(T) || type_is_boolean(T)

For example, this code:

v := #simd[8]i32 { -1, -2, +3, +4, -5, +6, +7, -8 }
fmt.println(simd.extract_msbs(v))

will print:

bit_set[0..=7]{0, 1, 4, 7}

since elements 0, 1, 4, and 7 have their most significant bits set (due to being negative).

This intrinsic is particularly useful in conjunction with lane-wise comparison masks, in a few particular use cases that I've found so far.

  1. Counting matches. This can be useful on its own or in conjunction with masked_compress_store, for instance, where card(extract_msbs(v)) can be used to determine how many elements will be written. Examples:
count_range :: proc (src: []#simd[16]f32, lower, upper: f32) -> (result: int) {
	// Counts the number of values in `src` that are in the range `[lower, upper].`
	for v in src {
		mask := simd.lanes_ge(v, auto_cast lower) & simd.lanes_le(v, auto_cast upper)
		result += card(simd.extract_msbs(mask))
	}
	return
}

filter_range :: proc (dst: ^[dynamic]f32, src: []#simd[16]f32, lower, upper: f32) {
	// Appends to `dst` the values of `src` that are in the range `[lower, upper].`
	for v in src {
		mask := simd.lanes_ge(v, auto_cast lower) & simd.lanes_le(v, auto_cast upper)
		old_len := len(dst)
		non_zero_resize(dst, old_len + card(simd.extract_msbs(mask)))
		#no_bounds_check { simd.masked_compress_store(&dst[old_len], v, mask) }
	}
}

This use case can also be done via -simd_reduce_add_ordered(mask), though I find using extract_msbs to be clearer on intent.

  1. Identifying which elements matched, e.g. for searching or looping over them. Examples:
first_range :: proc (src: []#simd[16]f32, lower, upper: f32) -> (at: int, found: bool) {
	// Returns the (flattened) index of the first value in `src` that is in the range `[lower, upper]`.
	for v, i in src {
		matches := simd.extract_msbs(simd.lanes_ge(v, auto_cast lower) & simd.lanes_le(v, auto_cast upper))
		for j in matches {
			at = 16*i + j
			found = true
			return
		}
	}
	return
}

histogram_range :: proc (bins: []int, src: []#simd[16]f32, lower, upper: f32) {
	// Generates histogram bin counts for the values in `src` in the range `[lower, upper)`.
	// bins[0] will be the number of values that fall in the first 1/Nth of the range,
	// bins[1] will be the number of values that fall in the second 1/Nth of the range, etc.
	span := (upper - lower) / f32(len(bins))
	for v in src {
		matches := simd.extract_msbs(simd.lanes_ge(v, auto_cast lower) & simd.lanes_lt(v, auto_cast upper))

		phases := (v - auto_cast lower) / auto_cast span
		indices := cast(#simd[16]i32)phases

		for j in matches {
			// Can't be done in parallel due to potential intersection
			i := int(simd.extract(indices, j))
			bins[i] += 1
		}
	}
	return
}
  1. In more specific cases, the bit_set itself can also be directly useful for doing boolean logic en masse. Condensing the original checks to bit-masks allows the bit-masks themselves to be used in SIMD vectors to do large amounts of boolean logic at once.

@Barinzaya Barinzaya force-pushed the simd_extract_msbs branch 2 times, most recently from 9d2e6f5 to 699d3ca Compare November 18, 2024 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant