[PyTorch] Experimental FP8 tensor class #452

timmoon10 · 2023-09-29T15:05:05Z

This FP8 tensor class is based on the implementation at https://github.com/facebookexperimental/protoquant/tree/fp8_poc and is primarily oriented toward enabling efficient FP8 support in Apex's DistributedFusedAdam. See NVIDIA/NeMo#7469 and NVIDIA/NeMo#7565.

CC @sudhakarsingh27 @ksivaman

transformer_engine/pytorch/fp8.py

ptrendx · 2023-09-30T05:41:44Z

/te-ci

timmoon10 · 2023-10-05T23:58:23Z

/te-ci

sudhakarsingh27 · 2023-10-16T20:18:08Z

/te-ci pytorch

Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ksivaman · 2023-10-20T04:47:45Z

/te-ci pytorch

ksivaman · 2023-10-20T08:16:05Z

/te-ci pytorch

ksivaman · 2023-10-20T08:17:24Z

/te-ci pytorch

ptrendx · 2023-10-23T18:23:58Z

transformer_engine/pytorch/float8_tensor.py

+    handled outside this class. If a tensor is initialized with an FP8
+    metadata object, it extracts the information it needs so it isn't
+    affected by later changes in the FP8 metadata (although its design
+    does cause us to leak some subtle side-effects into FP8 metadata).


This doc is not really correct since we are holding a view to the meta, right?

Ops using the tensor class's __torch_dispatch__ are insensitive to external changes in the meta since we cache scale_inv. However, all bets are off when we extract _data and pass it to external ops like tex.fp8_gemm.

transformer_engine/pytorch/float8_tensor.py

transformer_engine/pytorch/fp8.py

transformer_engine/pytorch/module/base.py

transformer_engine/pytorch/module/layernorm_linear.py

transformer_engine/pytorch/fp8.py

transformer_engine/pytorch/module/base.py

transformer_engine/pytorch/module/layernorm_linear.py

timmoon10 · 2023-10-24T00:46:52Z

transformer_engine/pytorch/float8_tensor.py

+    handled outside this class. If a tensor is initialized with an FP8
+    metadata object, it extracts the information it needs so it isn't
+    affected by later changes in the FP8 metadata (although its design
+    does cause us to leak some subtle side-effects into FP8 metadata).


Ops using the tensor class's __torch_dispatch__ are insensitive to external changes in the meta since we cache scale_inv. However, all bets are off when we extract _data and pass it to external ops like tex.fp8_gemm.

transformer_engine/pytorch/float8_tensor.py

transformer_engine/pytorch/fp8.py

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Handle case where transpose cache is updated externally. Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

transformer_engine/pytorch/float8_tensor.py

Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

transformer_engine/pytorch/float8_tensor.py

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

Easier for multiple tensors to share, e.g. detached tensors. Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

…10/TransformerEngine into float8tensor_experiments

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ptrendx

Approving as experimental. We will iterate upon this in the next release.

* Experimental FP8 tensor Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Add fp8 tensor to ci test Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * review comments and tests Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Minor changes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Default to FP8 usage Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix docs Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Naming changes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * minor fix Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix transpose caching Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Debug transpose caching Handle case where transpose cache is updated externally. Signed-off-by: Tim Moon <[email protected]> * Rename FP8GlobalStateManager.with_fp8_parameters Signed-off-by: Tim Moon <[email protected]> * remove Float8Tensor from import API Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Avoid caching FP8 transposes if not required Signed-off-by: Tim Moon <[email protected]> * Fix import error in FP8 tensor tests Signed-off-by: Tim Moon <[email protected]> * Fix tranpose caching and checkpointing bug Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Improve caching and fix distopt case Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Update transformer_engine/pytorch/float8_tensor.py Signed-off-by: Tim Moon <[email protected]> * Remove recursive logic Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix cache reset bug Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Store FP8 attributes in dict Easier for multiple tensors to share, e.g. detached tensors. Signed-off-by: Tim Moon <[email protected]> * Make sure scale_inv is 1D tensor Signed-off-by: Tim Moon <[email protected]> * Make sure scale_inv is 1D tensor Signed-off-by: Tim Moon <[email protected]> * Fixes and detach recipe Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Set default fp8 data type Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]>

* Experimental FP8 tensor Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Add fp8 tensor to ci test Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * review comments and tests Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Minor changes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Default to FP8 usage Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix docs Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Naming changes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * minor fix Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix transpose caching Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Debug transpose caching Handle case where transpose cache is updated externally. Signed-off-by: Tim Moon <[email protected]> * Rename FP8GlobalStateManager.with_fp8_parameters Signed-off-by: Tim Moon <[email protected]> * remove Float8Tensor from import API Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Avoid caching FP8 transposes if not required Signed-off-by: Tim Moon <[email protected]> * Fix import error in FP8 tensor tests Signed-off-by: Tim Moon <[email protected]> * Fix tranpose caching and checkpointing bug Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Improve caching and fix distopt case Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Update transformer_engine/pytorch/float8_tensor.py Signed-off-by: Tim Moon <[email protected]> * Remove recursive logic Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix cache reset bug Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Store FP8 attributes in dict Easier for multiple tensors to share, e.g. detached tensors. Signed-off-by: Tim Moon <[email protected]> * Make sure scale_inv is 1D tensor Signed-off-by: Tim Moon <[email protected]> * Make sure scale_inv is 1D tensor Signed-off-by: Tim Moon <[email protected]> * Fixes and detach recipe Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Set default fp8 data type Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Ming Huang <[email protected]>

* Experimental FP8 tensor Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Add fp8 tensor to ci test Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * review comments and tests Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Minor changes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Default to FP8 usage Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix docs Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Naming changes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * minor fix Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix transpose caching Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Debug transpose caching Handle case where transpose cache is updated externally. Signed-off-by: Tim Moon <[email protected]> * Rename FP8GlobalStateManager.with_fp8_parameters Signed-off-by: Tim Moon <[email protected]> * remove Float8Tensor from import API Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Avoid caching FP8 transposes if not required Signed-off-by: Tim Moon <[email protected]> * Fix import error in FP8 tensor tests Signed-off-by: Tim Moon <[email protected]> * Fix tranpose caching and checkpointing bug Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Improve caching and fix distopt case Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Update transformer_engine/pytorch/float8_tensor.py Signed-off-by: Tim Moon <[email protected]> * Remove recursive logic Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix cache reset bug Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Store FP8 attributes in dict Easier for multiple tensors to share, e.g. detached tensors. Signed-off-by: Tim Moon <[email protected]> * Make sure scale_inv is 1D tensor Signed-off-by: Tim Moon <[email protected]> * Make sure scale_inv is 1D tensor Signed-off-by: Tim Moon <[email protected]> * Fixes and detach recipe Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Set default fp8 data type Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

carmocca · 2023-11-29T02:39:40Z

transformer_engine/pytorch/fp8.py

+             * full model training using optimizer with master weights, where the high
+               precision copies of weights are already present in the optimizer.


How does this look in practice? If the model will be initialized directly with fp8 weights, how does the optimizer get high-precision copies?

timmoon10 added the enhancement New feature or request label Sep 29, 2023

ptrendx reviewed Sep 29, 2023

View reviewed changes

transformer_engine/pytorch/fp8.py Outdated Show resolved Hide resolved

ptrendx added the 1.0.0 label Oct 16, 2023

ksivaman force-pushed the float8tensor_experiments branch from de20156 to 4315115 Compare October 16, 2023 21:08

Experimental FP8 tensor

b6bfddb

Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ksivaman force-pushed the float8tensor_experiments branch from 67f7cd3 to b6bfddb Compare October 19, 2023 23:45

Add fp8 tensor to ci test

36093a5

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

ksivaman marked this pull request as ready for review October 20, 2023 04:47

Merge branch 'main' into float8tensor_experiments

b50423b