You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The upcoming Intel AVX10.2 instruction set extension is going to be adding support for conversions to E5M2 (BF8) and E4M3 (HF8) 8-bit floating point types from F16, along with conversions to F16 from E4M3 (HF8).
The E5M2 (BF8) floating-point format has 1 sign bit, 5 exponent bits, and 2 mantissa bits, and the bit representation of the E5M2 format is equivalent to the upper 8 bits of a hwy::float16_t (16-bit IEEE 754 half-precision floating-point) value (similar to the bit representation of the hwy::bfloat16_t being equivalent to the upper 16 bits of a 32-bit IEEE 754 single-precision floating-point value).
The E4M3 (HF8) floating-point format has 1 sign bit, 4 exponent bits, and 3 mantissa bits. The E4M3 floating-point format has no infinities, and the E4M3 format has only 2 NaN bit representations (0x7F and 0xFF). The E4M3 floating-point format considers non-NaN values that have the largest exponent to be normal floating-point values whose absolute value is between 256 and 448, unlike most of the floating-point formats which consider values having the largest exponent to be infinities or NaN values.
The AVX10.2 VCVTNE2PH2BF8 instruction converts a F16 vector to a E5M2 (BF8) vector, and the AVX10.2 VCVTNEPH2HF8 instruction converts a F16 vector to a E4M3 (HF8) vector.
The AVX10.2 VCVTHF82PH instruction converts a E4M3 (HF8) vector to a F16 vector.
Arm has already added the FP8 AArch64 extension that adds support for conversions to E5M2/E4M3 floating-point types from F16/F32 floating-types along with conversions from E5M2/E4M3 to F16/BF16.
The text was updated successfully, but these errors were encountered:
FYI we are using a hybrid of e5 and e4 called SFP, which gives m3 for larger numbers and m2 for smaller, while retaining ~24 bit dynamic range. This is also fast to convert (code) to bf16 via two permutex2var, enabling fast FMA into f32 via _mm512_dpbf16_ps. It also avoids having to choose between the two formats.
By contrast, conversions to f16 seem less useful, given the lack of precision and (last I checked) low-throughput f16 <-> f32 conversions on Intel.
Do you have a use case for these specific conversions to f16?
I agree E5/E4 are currently used in ML frameworks.
Are we aware of any current or future Highway users that would use these conversions themselves, or to interoperate with something else that does?
The upcoming Intel AVX10.2 instruction set extension is going to be adding support for conversions to E5M2 (BF8) and E4M3 (HF8) 8-bit floating point types from F16, along with conversions to F16 from E4M3 (HF8).
The E5M2 and E4M3 floating point formats are described in the Open Compute Project 8-bit Floating Point Specification, which can be found at https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-12-01-pdf-1.
The E5M2 (BF8) floating-point format has 1 sign bit, 5 exponent bits, and 2 mantissa bits, and the bit representation of the E5M2 format is equivalent to the upper 8 bits of a hwy::float16_t (16-bit IEEE 754 half-precision floating-point) value (similar to the bit representation of the hwy::bfloat16_t being equivalent to the upper 16 bits of a 32-bit IEEE 754 single-precision floating-point value).
The E4M3 (HF8) floating-point format has 1 sign bit, 4 exponent bits, and 3 mantissa bits. The E4M3 floating-point format has no infinities, and the E4M3 format has only 2 NaN bit representations (0x7F and 0xFF). The E4M3 floating-point format considers non-NaN values that have the largest exponent to be normal floating-point values whose absolute value is between 256 and 448, unlike most of the floating-point formats which consider values having the largest exponent to be infinities or NaN values.
The AVX10.2 VCVTNE2PH2BF8 instruction converts a F16 vector to a E5M2 (BF8) vector, and the AVX10.2 VCVTNEPH2HF8 instruction converts a F16 vector to a E4M3 (HF8) vector.
The AVX10.2 VCVTHF82PH instruction converts a E4M3 (HF8) vector to a F16 vector.
Arm has already added the FP8 AArch64 extension that adds support for conversions to E5M2/E4M3 floating-point types from F16/F32 floating-types along with conversions from E5M2/E4M3 to F16/BF16.
The text was updated successfully, but these errors were encountered: