Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic-only new operations #5

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions g3doc/quick_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1182,6 +1182,15 @@ equivalent to, and potentially more efficient than, `And(m, IsNaN(v));` etc.
<code>M **MaskedIsNaN**(M m, V v)</code>: returns mask indicating whether
`v[i]` is "not a number" (unordered) or `false` if `m[i]` is false.

* `V`: `{f}` \
<code>M **MaskedIsInf**(M m, V v)</code>: returns mask indicating whether
`v[i]` is positive or negative infinity or `false` if `m[i]` is false.

* `V`: `{f}` \
<code>M **MaskedIsFinite**(M m, V v)</code>: returns mask indicating whether
`v[i]` is neither NaN nor infinity, i.e. normal, subnormal or zero or
`false` if `m[i]` is false. Equivalent to `Not(Or(IsNaN(v), IsInf(v)))`.

### Logical

* `V`: `{u,i}` \
Expand Down Expand Up @@ -2024,6 +2033,17 @@ obtain the `D` that describes the return type.
<code>Vec&lt;D&gt; **DemoteTo**(D, V v)</code>: narrows float to half (for
bf16, it is unspecified whether this truncates or rounds).

* `V`,`D`: (`f64,i32`), (`f32,f16`) \
<code>Vec&lt;D&gt; **DemoteCeilTo**(D, V v)</code>: Demotes a floating point
number to half-sized integral type with ceiling rounding.

* `V`,`D`: (`f64,i32`), (`f32,f16`) \
<code>Vec&lt;D&gt; **DemoteFloorTo**(D, V v)</code>: Demotes a floating
point number to half-sized integral type with floor rounding.

* <code>Vec&lt;D&gt; **MaskedDemoteTo**(M m, D d, V v)</code>: returns `v[i]`
demoted to `D` where m is active and returns zero otherwise.

#### Single vector promotion

These functions promote a half vector to a full vector. To obtain halves, use
Expand All @@ -2050,6 +2070,27 @@ These functions promote a half vector to a full vector. To obtain halves, use
integer. Returns an implementation-defined value if the input exceeds the
destination range.

* `V`: `f`, `D`:`{u,i,f}`\
<code>Vec&lt;D&gt; **PromoteCeilTo**(D, V part)</code>: rounds `part[i]`
up and converts the rounded value to a signed or unsigned integer.
Returns an implementation-defined value if the input exceeds the
destination range.

* `V`: `f`, `D`:`{u,i,f}`\
<code>Vec&lt;D&gt; **PromoteFloorTo**(D, V part)</code>: rounds `part[i]`
down and converts the rounded value to a signed or unsigned integer.
Returns an implementation-defined value if the input exceeds the
destination range.

* `V`: `f`, `D`:`{u,i,f}`\
<code>Vec&lt;D&gt; **PromoteToNearestInt **(D, V part)</code>: rounds
`part[i]` towards the nearest integer, with ties to even, and converts the
rounded value to a signed or unsigned integer. Returns an
implementation-defined value if the input exceeds the destination range.

* <code>Vec&lt;D&gt; **MaskedPromoteTo**(M m, D d, V v)</code>: returns `v[i]`
widened to `D` where m is active and returns zero otherwise.

The following may be more convenient or efficient than also calling `LowerHalf`
/ `UpperHalf`:

Expand Down Expand Up @@ -2309,6 +2350,12 @@ Ops in this section are only available if `HWY_TARGET != HWY_SCALAR`:
`InterleaveOdd(d, a, b)` is usually more efficient than `OddEven(b,
DupOdd(a))`.

* <code>V **MaskedInterleaveEven**(M m, V a, V b)</code>: Performs the same
operation as InterleaveEven, but returns zero in lanes where `m[i]` is false.

* <code>V **MaskedInterleaveOdd**(M m, V a, V b)</code>: Performs the same
operation as InterleaveOdd, but returns zero in lanes where `m[i]` is false.

#### Zip

* `Ret`: `MakeWide<T>`; `V`: `{u,i}{8,16,32}` \
Expand Down Expand Up @@ -2465,6 +2512,22 @@ The following `ReverseN` must not be called if `Lanes(D()) < N`:
must be in the range `[0, 2 * Lanes(d))` but need not be unique. The index
type `TI` must be an integer of the same size as `TFromD<D>`.

* <code>V **MaskedTableLookupLanesOr**(V no, M m, V a, unspecified)</code> returns the
result of `TableLookupLanes(a, unspecified)` where `m[i]` is true, and returns
`no[i]` where `m[i]` is false.

* <code>V **MaskedTableLookupLanes**(M m, V a, unspecified)</code> returns
the result of `TableLookupLanes(a, unspecified)` where `m[i]` is true, and
returns zero where `m[i]` is false.

* <code>V **MaskedTwoTablesLookupLanesOr**(D d, M m, V a, V b, unspecified)</code>
returns the result of `TwoTablesLookupLanes(V a, V b, unspecified)` where
`m[i]` is true, and `a[i]` where `m[i]` is false.

* <code>V **MaskedTwoTablesLookupLanes**(D d, M m, V a, V b, unspecified)</code>
returns the result of `TwoTablesLookupLanes(V a, V b, unspecified)` where
`m[i]` is true, and zero where `m[i]` is false.

* <code>V **Per4LaneBlockShuffle**&lt;size_t kIdx3, size_t kIdx2, size_t
kIdx1, size_t kIdx0&gt;(V v)</code> does a per 4-lane block shuffle of `v`
if `Lanes(DFromV<V>())` is greater than or equal to 4 or a shuffle of the
Expand Down
151 changes: 151 additions & 0 deletions hwy/ops/generic_ops-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -518,6 +518,24 @@ HWY_API V InterleaveEven(V a, V b) {
}
#endif

// ------------------------------ MaskedInterleaveEven

#if HWY_TARGET != HWY_SCALAR || HWY_IDE
template <class V, class M>
HWY_API V MaskedInterleaveEven(M m, V a, V b) {
return IfThenElseZero(m, InterleaveEven(DFromV<V>(), a, b));
}
#endif

// ------------------------------ MaskedInterleaveOdd

#if HWY_TARGET != HWY_SCALAR || HWY_IDE
template <class V, class M>
HWY_API V MaskedInterleaveOdd(M m, V a, V b) {
return IfThenElseZero(m, InterleaveOdd(DFromV<V>(), a, b));
}
#endif

// ------------------------------ MinMagnitude/MaxMagnitude

#if (defined(HWY_NATIVE_FLOAT_MIN_MAX_MAGNITUDE) == defined(HWY_TARGET_TOGGLE))
Expand Down Expand Up @@ -661,6 +679,25 @@ HWY_API V MaskedSatSubOr(V no, M m, V a, V b) {
}
#endif // HWY_NATIVE_MASKED_ARITH

// ------------------------------ MaskedIsInf/MaskedIsFinite
#if (defined(HWY_NATIVE_MASKED_IS_INF) == defined(HWY_TARGET_TOGGLE))
#ifdef HWY_NATIVE_MASKED_IS_INF
#undef HWY_NATIVE_MASKED_IS_INF
#else
#define HWY_NATIVE_MASKED_IS_INF
#endif

template <class V, class M, class D = DFromV<V>>
HWY_API MFromD<D> MaskedIsInf(const M m, const V v) {
return And(m, IsInf(v));
}

template <class V, class M, class D = DFromV<V>>
HWY_API MFromD<D> MaskedIsFinite(const M m, const V v) {
return And(m, IsFinite(v));
}
#endif // HWY_NATIVE_MASKED_IS_INF

#if (defined(HWY_NATIVE_ZERO_MASKED_ARITH) == defined(HWY_TARGET_TOGGLE))
#ifdef HWY_NATIVE_ZERO_MASKED_ARITH
#undef HWY_NATIVE_ZERO_MASKED_ARITH
Expand Down Expand Up @@ -3501,6 +3538,19 @@ HWY_API VFromD<D> DemoteTo(D df16, VFromD<Rebind<float, D>> v) {

#endif // HWY_NATIVE_F16C

// ------------------------------ PromoteTo F16->I
#if HWY_HAVE_FLOAT16 || HWY_IDE
template <class D, HWY_IF_NOT_FLOAT_D(D), HWY_IF_T_SIZE_D(D, sizeof(float))>
HWY_API VFromD<D> PromoteTo(D d, VFromD<Rebind<float16_t, D>> v) {
return ConvertTo(d, PromoteTo(Rebind<float, D>(), v));
}

template <class D, HWY_IF_NOT_FLOAT_D(D), HWY_IF_T_SIZE_GT_D(D, sizeof(float))>
HWY_API VFromD<D> PromoteTo(D d, VFromD<Rebind<float16_t, D>> v) {
return PromoteTo(d, PromoteTo(Rebind<float, D>(), v));
}
#endif

// ------------------------------ F64->F16 DemoteTo
#if (defined(HWY_NATIVE_DEMOTE_F64_TO_F16) == defined(HWY_TARGET_TOGGLE))
#ifdef HWY_NATIVE_DEMOTE_F64_TO_F16
Expand Down Expand Up @@ -3634,6 +3684,53 @@ HWY_API VFromD<D> ReorderDemote2To(D dbf16, VFromD<Repartition<float, D>> a,

#endif // HWY_NATIVE_DEMOTE_F32_TO_BF16

// ------------------------------ DemoteTo (Alternate Rounding)
#if (defined(HWY_NATIVE_DEMOTE_CEIL_TO) == defined(HWY_TARGET_TOGGLE))
#ifdef HWY_NATIVE_DEMOTE_CEIL_TO
#undef HWY_NATIVE_DEMOTE_CEIL_TO
#else
#define HWY_NATIVE_DEMOTE_CEIL_TO
#endif

#if HWY_HAVE_FLOAT64
template <class D32, HWY_IF_UI32_D(D32)>
HWY_API VFromD<D32> DemoteCeilTo(D32 d32, VFromD<Rebind<double, D32>> v) {
return DemoteTo(d32, Ceil(v));
}
#endif // HWY_HAVE_FLOAT64

#if HWY_HAVE_FLOAT16
template <class D16, HWY_IF_F16_D(D16)>
HWY_API VFromD<D16> DemoteCeilTo(D16 d16, VFromD<Rebind<float, D16>> v) {
return DemoteTo(d16, Ceil(v));
}
#endif // HWY_HAVE_FLOAT16

#endif // HWY_NATIVE_DEMOTE_CEIL_TO

#if (defined(HWY_NATIVE_DEMOTE_FLOOR_TO) == defined(HWY_TARGET_TOGGLE))
#ifdef HWY_NATIVE_DEMOTE_FLOOR_TO
#undef HWY_NATIVE_DEMOTE_FLOOR_TO
#else
#define HWY_NATIVE_DEMOTE_FLOOR_TO
#endif

#if HWY_HAVE_FLOAT64
template <class D32, HWY_IF_UI32_D(D32)>
HWY_API VFromD<D32> DemoteFloorTo(D32 d32, VFromD<Rebind<double, D32>> v) {
return DemoteTo(d32, Floor(v));
}
#endif // HWY_HAVE_FLOAT64

#if HWY_HAVE_FLOAT16
template <class D16, HWY_IF_F16_D(D16)>
HWY_API VFromD<D16> DemoteFloorTo(D16 d16, VFromD<Rebind<float, D16>> v) {
return DemoteTo(d16, Floor(v));
}
#endif // HWY_HAVE_FLOAT16

#endif // HWY_NATIVE_DEMOTE_FLOOR_TO

// ------------------------------ PromoteInRangeTo
#if (defined(HWY_NATIVE_F32_TO_UI64_PROMOTE_IN_RANGE_TO) == \
defined(HWY_TARGET_TOGGLE))
Expand Down Expand Up @@ -3765,6 +3862,24 @@ HWY_API VFromD<D> PromoteInRangeOddTo(D d, V v) {
}
#endif // HWY_TARGET != HWY_SCALAR

// ------------------------------ PromoteCeilTo
template <class DTo, class V, HWY_IF_FLOAT_V(V)>
HWY_API Vec<DTo> PromoteCeilTo(DTo d, V v) {
return PromoteTo(d, Ceil(v));
}

// ------------------------------ PromoteFloorTo
template <class DTo, class V, HWY_IF_FLOAT_V(V)>
HWY_API Vec<DTo> PromoteFloorTo(DTo d, V v) {
return PromoteTo(d, Floor(v));
}

// ------------------------------ PromoteToNearestInt
template <class DTo, class V, HWY_IF_FLOAT_V(V)>
HWY_API Vec<DTo> PromoteToNearestInt(DTo d, V v) {
return PromoteTo(d, Round(v));
}

// ------------------------------ SumsOf2

#if HWY_TARGET != HWY_SCALAR || HWY_IDE
Expand Down Expand Up @@ -4781,6 +4896,18 @@ HWY_API VFromD<D> MaskedConvertTo(M m, D d, V v) {
return IfThenElseZero(m, ConvertTo(d, v));
}

// ------------------------------ MaskedPromoteTo
template <class D, class V, class M>
HWY_API VFromD<D> MaskedPromoteTo(M m, D d, V v) {
return IfThenElseZero(m, PromoteTo(d, v));
}

// ------------------------------ MaskedDemoteTo
template <class D, class V, class M>
HWY_API VFromD<D> MaskedDemoteTo(M m, D d, V v) {
return IfThenElseZero(m, DemoteTo(d, v));
}

// ------------------------------ Integer division
#if (defined(HWY_NATIVE_INT_DIV) == defined(HWY_TARGET_TOGGLE))
#ifdef HWY_NATIVE_INT_DIV
Expand Down Expand Up @@ -6890,6 +7017,30 @@ HWY_API V ReverseBits(V v) {
}
#endif // HWY_NATIVE_REVERSE_BITS_UI16_32_64

// ------------------------------ MaskedTableLookupLanesOr
template <class V, class M>
HWY_API V MaskedTableLookupLanesOr(V no, M m, V a, IndicesFromD<DFromV<V>> idx) {
return IfThenElse(m, TableLookupLanes(a, idx), no);
}

// ------------------------------ MaskedTableLookupLanes
template <class V, class M>
HWY_API V MaskedTableLookupLanes(M m, V a, IndicesFromD<DFromV<V>> idx) {
return IfThenElseZero(m, TableLookupLanes(a, idx));
}

// ------------------------------ TwoTablesLookupLanesOr
template <class D, class V, class M>
HWY_API V MaskedTwoTablesLookupLanesOr(D d, M m, V a, V b, IndicesFromD<D> idx) {
return IfThenElse(m, TwoTablesLookupLanes(d, a, b, idx), a);
}

// ------------------------------ TwoTablesLookupLanesOrZero
template <class D, class V, class M>
HWY_API V MaskedTwoTablesLookupLanes(D d, M m, V a, V b, IndicesFromD<D> idx) {
return IfThenElse(m, TwoTablesLookupLanes(d, a, b, idx), Zero(d));
}

// ------------------------------ Per4LaneBlockShuffle

#if (defined(HWY_NATIVE_PER4LANEBLKSHUF_DUP32) == defined(HWY_TARGET_TOGGLE))
Expand Down
60 changes: 60 additions & 0 deletions hwy/tests/blockwise_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -274,12 +274,72 @@ struct TestInterleaveOdd {
}
};

struct TestMaskedInterleaveEven {
template <class T, class D>
HWY_NOINLINE void operator()(T /*unused*/, D d) {
const size_t N = Lanes(d);
const MFromD<D> first_3 = FirstN(d, 3);
auto even_lanes = AllocateAligned<T>(N);
auto odd_lanes = AllocateAligned<T>(N);
auto expected = AllocateAligned<T>(N);
HWY_ASSERT(even_lanes && odd_lanes && expected);
for (size_t i = 0; i < N; ++i) {
even_lanes[i] = ConvertScalarTo<T>(2 * i + 0);
odd_lanes[i] = ConvertScalarTo<T>(2 * i + 1);
}
const auto even = Load(d, even_lanes.get());
const auto odd = Load(d, odd_lanes.get());

for (size_t i = 0; i < N; ++i) {
if (i < 3) {
expected[i] = ConvertScalarTo<T>(2 * i - (i & 1));
} else {
expected[i] = ConvertScalarTo<T>(0);
}
}

HWY_ASSERT_VEC_EQ(d, expected.get(),
MaskedInterleaveEven(first_3, even, odd));
}
};

struct TestMaskedInterleaveOdd {
template <class T, class D>
HWY_NOINLINE void operator()(T /*unused*/, D d) {
const size_t N = Lanes(d);
const MFromD<D> first_3 = FirstN(d, 3);
auto even_lanes = AllocateAligned<T>(N);
auto odd_lanes = AllocateAligned<T>(N);
auto expected = AllocateAligned<T>(N);
HWY_ASSERT(even_lanes && odd_lanes && expected);
for (size_t i = 0; i < N; ++i) {
even_lanes[i] = ConvertScalarTo<T>(2 * i + 0);
odd_lanes[i] = ConvertScalarTo<T>(2 * i + 1);
}
const auto even = Load(d, even_lanes.get());
const auto odd = Load(d, odd_lanes.get());

for (size_t i = 0; i < N; ++i) {
if (i < 3) {
expected[i] = ConvertScalarTo<T>((2 * i) - (i & 1) + 2);
} else {
expected[i] = ConvertScalarTo<T>(0);
}
}

HWY_ASSERT_VEC_EQ(d, expected.get(),
MaskedInterleaveOdd(first_3, even, odd));
}
};

HWY_NOINLINE void TestAllInterleave() {
// Not DemoteVectors because this cannot be supported by HWY_SCALAR.
ForAllTypes(ForShrinkableVectors<TestInterleaveLower>());
ForAllTypes(ForShrinkableVectors<TestInterleaveUpper>());
ForAllTypes(ForShrinkableVectors<TestInterleaveEven>());
ForAllTypes(ForShrinkableVectors<TestInterleaveOdd>());
ForAllTypes(ForShrinkableVectors<TestMaskedInterleaveEven>());
ForAllTypes(ForShrinkableVectors<TestMaskedInterleaveOdd>());
}

struct TestZipLower {
Expand Down
Loading