Add support for E5M2 and E4M3 floating point types #2312

johnplatts · 2024-08-21T23:52:33Z

The upcoming Intel AVX10.2 instruction set extension is going to be adding support for conversions to E5M2 (BF8) and E4M3 (HF8) 8-bit floating point types from F16, along with conversions to F16 from E4M3 (HF8).

The E5M2 and E4M3 floating point formats are described in the Open Compute Project 8-bit Floating Point Specification, which can be found at https://www.opencompute.org/documents/ocp-8-bit-floating-point-specification-ofp8-revision-1-0-2023-12-01-pdf-1.

The E5M2 (BF8) floating-point format has 1 sign bit, 5 exponent bits, and 2 mantissa bits, and the bit representation of the E5M2 format is equivalent to the upper 8 bits of a hwy::float16_t (16-bit IEEE 754 half-precision floating-point) value (similar to the bit representation of the hwy::bfloat16_t being equivalent to the upper 16 bits of a 32-bit IEEE 754 single-precision floating-point value).

The E4M3 (HF8) floating-point format has 1 sign bit, 4 exponent bits, and 3 mantissa bits. The E4M3 floating-point format has no infinities, and the E4M3 format has only 2 NaN bit representations (0x7F and 0xFF). The E4M3 floating-point format considers non-NaN values that have the largest exponent to be normal floating-point values whose absolute value is between 256 and 448, unlike most of the floating-point formats which consider values having the largest exponent to be infinities or NaN values.

The AVX10.2 VCVTNE2PH2BF8 instruction converts a F16 vector to a E5M2 (BF8) vector, and the AVX10.2 VCVTNEPH2HF8 instruction converts a F16 vector to a E4M3 (HF8) vector.

The AVX10.2 VCVTHF82PH instruction converts a E4M3 (HF8) vector to a F16 vector.

Arm has already added the FP8 AArch64 extension that adds support for conversions to E5M2/E4M3 floating-point types from F16/F32 floating-types along with conversions from E5M2/E4M3 to F16/BF16.

jan-wassenberg · 2024-08-22T05:27:02Z

Thanks for the heads-up!

FYI we are using a hybrid of e5 and e4 called SFP, which gives m3 for larger numbers and m2 for smaller, while retaining ~24 bit dynamic range. This is also fast to convert (code) to bf16 via two permutex2var, enabling fast FMA into f32 via _mm512_dpbf16_ps. It also avoids having to choose between the two formats.

By contrast, conversions to f16 seem less useful, given the lack of precision and (last I checked) low-throughput f16 <-> f32 conversions on Intel.

Do you have a use case for these specific conversions to f16?

johnplatts · 2024-08-22T10:28:19Z

Do you have a use case for these specific conversions to f16?

There are some open-source libraries that can work with E5M2 and E4M3 floating point types, including CUTLASS (https://github.com/NVIDIA/cutlass), JAX (https://github.com/google/jax), ONNX Runtime (https://github.com/microsoft/onnxruntime), OpenVINO (https://github.com/openvinotoolkit/openvino), PyTorch (https://github.com/pytorch/pytorch), and TensorFlow (https://github.com/tensorflow/tensorflow).

One of the most common use cases for E5M2/E4M3 to F16 conversions is tensor arithmetic.

jan-wassenberg · 2024-08-23T17:49:01Z

I agree E5/E4 are currently used in ML frameworks.
Are we aware of any current or future Highway users that would use these conversions themselves, or to interoperate with something else that does?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for E5M2 and E4M3 floating point types #2312

Add support for E5M2 and E4M3 floating point types #2312

johnplatts commented Aug 21, 2024

jan-wassenberg commented Aug 22, 2024

johnplatts commented Aug 22, 2024

jan-wassenberg commented Aug 23, 2024

Add support for E5M2 and E4M3 floating point types #2312

Add support for E5M2 and E4M3 floating point types #2312

Comments

johnplatts commented Aug 21, 2024

jan-wassenberg commented Aug 22, 2024

johnplatts commented Aug 22, 2024

jan-wassenberg commented Aug 23, 2024