early return by check tensor already casted or not #233

wanchaol · 2024-03-04T05:52:29Z

as titled, it turns out we don't need to install additional hooks base on our TP + FP8 design. The only thing we need to do here is to be able to turn off activation casting, so that we can put activation casting in the TP hooks

So just check if the tensor already been casted to fp8 or not, as in TP we would cast activation into the DTensor's Float8Colwise/Rowwise instead.

as titled, it turns out we don't need to install additional hooks base on our TP + FP8 design. The only thing we need to do here is to be able to turn off activation casting, so that we can put activation casting in the TP hooks So renaming the flag to cast_activation instead and delete relevant tests

facebook-github-bot · 2024-03-04T18:32:51Z

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

awgu · 2024-03-04T18:50:14Z

float8_experimental/float8_dynamic_linear.py


    def forward(self, x):
        # cast x to float8_e4m3fn if not using activation hooks
-        x_fp8 = x if self.use_activation_hooks else self.cast_to_float8_e4m3fn(x)
+        x_fp8 = self.cast_to_float8_e4m3fn(x) if self.cast_activation else x


If self.cast_activation is False, then what type is x? Would it be a DTensor whose local tensor is Float8Tensor?

I wonder if we need the bool at all. For example, the cast_to_float8_e4m3fn function could be idempotent in that if it is passed a Float8Tensor, then we return it early. The wrinkle here is that it could be a DTensor wrapping a Float8Tensor.

If its DTensor(Float8Tensor)), this would still be a no-op though right, since the cast has already been done? I think it could be if it was something like if no flaot8Tensor subclasses exist in the subclass hierarchy then pass through otherwise do a cast

if cast activation is false, it would be DTensor(torch.Tensor[fp32/16]), we want to turn off activation casting is because we want this casting happen inside the TP preforward/forward hooks (as it needs to happen in certain order, i.e. after from_local and before redistribute), see changes in this PR #234

So if you are going to be using the TP strategies then you turn off activation casting: https://github.com/pytorch-labs/float8_experimental/pull/234/files#diff-0c0c016522783d31fb102a4088caaa2a64f783f7b5449559be4670a11fd5ed31R172.

Doesnt this mean that when line 76 gets used,
x will be a DTensor(Float8Tensor) because x will be this input tensor: https://github.com/pytorch-labs/float8_experimental/pull/234/files#diff-ed26a8770aa85661ab521607c70b62b60123890c2518687d9f58f27e31200955R34 ?

updated according to our discussions :)

float8_experimental/float8_dynamic_linear.py

facebook-github-bot · 2024-03-12T05:00:12Z

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

wanchaol · 2024-03-12T05:04:43Z

float8_experimental/float8_dynamic_linear.py

@@ -30,63 +34,43 @@ def forward(

    @staticmethod
    def backward(ctx, gradY):
+        if tensor_already_casted_to_fp8(gradY):


for cast_to_float8_e5m2_bw, unfortunately I can't do a forward check only and have to check the backward gradients to see if it's already casted, as the forward only have y but not grad_y

drisspg · 2024-03-12T16:16:34Z

float8_experimental/float8_tensor.py

+    if isinstance(tensor, Float8Tensor):
+        return True
+    elif isinstance(tensor, DTensor) and isinstance(tensor._local_tensor, Float8Tensor):
+        # TODO: shall we stick to public API and directly use tensor.to_local() here?


I wonder if in general, subclasses composing with other subclasses should have a generic way to determine the nested types

Yeah, I'll also need to think about how this should behave, this seems in general how we can get subclass hierarchy from some type info, simply trying out type.mro or inspect.getmro seems not working for me. Maybe need to check with @ezyang @bdhirsh to see if there's any better suggestions

drisspg

Awesome! I love the ratio of red lines to green!

I think the fixture in conftest.py for xfailing the grad hooks can also be removed

Thanks!

facebook-github-bot · 2024-03-12T17:58:21Z

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-03-14T20:09:33Z

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-03-14T20:17:02Z

@wanchaol has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-03-14T21:24:20Z

@wanchaol merged this pull request in bfc60fb.

awgu · 2024-03-28T15:29:56Z

float8_experimental/float8_linear.py

        """
        Create an nn.Linear with fp8 compute from a regular nn.Linear

        Args:
            mod (torch.nn.Linear): nn.Linear to convert
            emulate (bool): whether to emulate fp8 matmul logic in float32
-            use_activation_hooks (bool): whether to use activation hooks instead of inlining the casting logic
+            cast_activation (bool): whether to use activation hooks instead of inlining the casting logic


nit: This looks like it should be removed.

I think this was removed in the next pr

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2024

format

be02472

wanchaol requested review from awgu and drisspg March 4, 2024 18:32

awgu reviewed Mar 4, 2024

View reviewed changes

drisspg reviewed Mar 4, 2024

View reviewed changes

float8_experimental/float8_dynamic_linear.py Outdated Show resolved Hide resolved

switch to have tensor_casted_to_fp8 util function instead

0136d1f

wanchaol changed the title ~~rename use_activation_hooks to cast_activation~~ early return by check tensor already casted or not Mar 12, 2024

wanchaol requested review from awgu and drisspg March 12, 2024 05:00

wanchaol commented Mar 12, 2024

View reviewed changes

drisspg reviewed Mar 12, 2024

View reviewed changes

drisspg approved these changes Mar 12, 2024

View reviewed changes

remove conftest

eea4595

wanchaol added 2 commits March 13, 2024 10:17

recursive check already casted

020642a

Merge branch 'main' into cast_activation

e5ec3bf

wanchaol added 2 commits March 14, 2024 13:15

lint

a1cb6ef

Merge branch 'main' into cast_activation

f036d59

facebook-github-bot closed this in bfc60fb Mar 14, 2024

facebook-github-bot added the Merged label Mar 14, 2024

awgu reviewed Mar 28, 2024

View reviewed changes

early return by check tensor already casted or not #233

early return by check tensor already casted or not #233

Uh oh!

Conversation

wanchaol commented Mar 4, 2024 • edited by drisspg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 4, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Mar 12, 2024

Uh oh!

wanchaol Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 12, 2024

Uh oh!

facebook-github-bot commented Mar 14, 2024

Uh oh!

facebook-github-bot commented Mar 14, 2024

Uh oh!

facebook-github-bot commented Mar 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wanchaol commented Mar 4, 2024 •

edited by drisspg

Loading

wanchaol Mar 12, 2024 •

edited

Loading