[PyTorch] Replace views with reshapes and update PyTorch autocast API #1250

eljandoubi · 2024-10-14T16:25:57Z

Description

Update torch.get_autocast_gpu_dtype to torch.get_autocast_dtype("cuda") for all the pytorch.
Migrate from torch.view to torch.reshape. Works with non-contiguous tensors, can copy data. It is handful for distributed training.

@timmoon10 @ksivaman @ptrendx @cyanguwa

Fixes #1247

Type of change

Bug fix (non-breaking change which fixes an issue)
Code refractor

timmoon10

Overall I agree that reshape is preferable to view. sed 's/view/reshape/g' is a little overkill though.

timmoon10 · 2024-10-16T01:01:53Z

transformer_engine/pytorch/tensor/float8_tensor.py

+                data=tensor._data.reshape(*shape),
            )
-        return tensor.view(*shape)
+        return tensor.reshape(*shape)


This logic is for Float8Tensor.view, so using reshape could cause correctness problems:

Suggested change

data=tensor._data.reshape(*shape),

)

return tensor.view(*shape)

return tensor.reshape(*shape)

data=tensor._data.view(*shape),

)

return tensor.view(*shape)

I see. I have made the changes for Float8Tensor.

transformer_engine/pytorch/csrc/extensions/attention.cu

tests/paddle/test_operators.py

timmoon10 · 2024-10-16T22:56:45Z

transformer_engine/pytorch/tensor/float8_tensor.py

        if isinstance(grad, Float8Tensor):
            dgrad = Float8Tensor.make_like(
                grad,
-                data=grad._data.reshape(ctx.shape),
+                data=grad._data.view(ctx.shape),
            )
            return dgrad, None
-        return grad.reshape(ctx.shape), None
+        return grad.view(ctx.shape), None


This logic is for Float8Tensor.reshape:

Suggested change

if isinstance(grad, Float8Tensor):

dgrad = Float8Tensor.make_like(

grad,

data=grad._data.reshape(ctx.shape),

data=grad._data.view(ctx.shape),

)

return dgrad, None

return grad.reshape(ctx.shape), None

return grad.view(ctx.shape), None

if isinstance(grad, Float8Tensor):

dgrad = Float8Tensor.make_like(

grad,

data=grad._data.reshape(ctx.shape),

)

return dgrad, None

return grad.reshape(ctx.shape), None

timmoon10 · 2024-10-16T22:57:33Z

transformer_engine/pytorch/tensor/float8_tensor.py

        if isinstance(tensor, Float8Tensor):
            return Float8Tensor.make_like(
                tensor,
-                data=tensor._data.reshape(*shape),
+                data=tensor._data.view(*shape),
            )
-        return tensor.reshape(*shape)
+        return tensor.view(*shape)


This logic is for Float8Tensor.reshape:

Suggested change

if isinstance(tensor, Float8Tensor):

return Float8Tensor.make_like(

tensor,

data=tensor._data.reshape(*shape),

data=tensor._data.view(*shape),

)

return tensor.reshape(*shape)

return tensor.view(*shape)

if isinstance(tensor, Float8Tensor):

return Float8Tensor.make_like(

tensor,

data=tensor._data.reshape(*shape),

)

return tensor.reshape(*shape)

transformer_engine/pytorch/attention.py

timmoon10 · 2024-10-17T19:48:34Z

/te-ci pytorch

timmoon10

Overall LGTM. reshape is generally preferable to view, although many of these changes are redundant since the tensors are already contiguous.

Please sign your commits to pass the DCO check, and we'll merge if there are no concerning test failures.

timmoon10

The changes to transformer_engine/pytorch/attention.py have snuck back in. It can be reverted with:

git checkout main -- transformer_engine/pytorch/attention.py
git commit -m "Revert changes to transformer_engine/pytorch/attention.py" --signoff

The commit history is mangled and there are still some unsigned commits. At this point it's better to squash your commits:

# Merge main branch
git checkout main
git pull origin main
git checkout fix_layernorm_fsdp
git merge main

# Squash all changes into a single commit
git reset main
git commit -a -m 'Use reshape instead of view and update PyTorch autocast API' --signoff

# Force-push to your GitHub
git push eljandoubi fix_layernorm_fsdp --force

Just in case, I've made a copy of your branch: https://github.com/timmoon10/TransformerEngine/tree/eljandoubi/fix_layernorm_fsdp

Signed-off-by: eljandoubi <[email protected]>

timmoon10 · 2024-10-18T18:23:18Z

The commit history was still mangled (original branch is at https://github.com/timmoon10/TransformerEngine/tree/eljandoubi/fix_layernorm_fsdp-20241018), so I've manually squashed the commit. However, bugs have crept back in (#1250 (comment), #1250 (comment), #1250 (comment)).

At the moment I think this PR is riskier and more of a hassle than it is worth it. If this is important for your use-case, I suggest breaking this up into two more manageable PRs:

Update the PyTorch autocast API. This should be a fairly easy and safe change.
Thoughtfully replace view with reshape. The current approach (blindly replace all views and changing back to reshape when necessary) was dangerous, mostly redundant, and exposed a large surface area for merge conflicts. It would be much better to do the opposite approach and only replace reshapes that you know are safe and effective.

eljandoubi force-pushed the fix_layernorm_fsdp branch 2 times, most recently from 90f2f7e to f22e963 Compare October 15, 2024 16:42

timmoon10 reviewed Oct 16, 2024

View reviewed changes

eljandoubi force-pushed the fix_layernorm_fsdp branch 2 times, most recently from 93ba06e to 2ac3da8 Compare October 16, 2024 21:09

eljandoubi closed this Oct 16, 2024

eljandoubi force-pushed the fix_layernorm_fsdp branch from 2ac3da8 to 9001081 Compare October 16, 2024 21:21

eljandoubi deleted the fix_layernorm_fsdp branch October 16, 2024 22:02

eljandoubi restored the fix_layernorm_fsdp branch October 16, 2024 22:03

eljandoubi reopened this Oct 16, 2024

eljandoubi force-pushed the fix_layernorm_fsdp branch from 873637c to 617e1de Compare October 16, 2024 22:11

timmoon10 requested changes Oct 16, 2024

View reviewed changes

eljandoubi force-pushed the fix_layernorm_fsdp branch from 274d4e3 to 99313bf Compare October 17, 2024 06:05

timmoon10 self-requested a review October 17, 2024 19:47

timmoon10 approved these changes Oct 17, 2024

View reviewed changes

timmoon10 self-requested a review October 18, 2024 01:28

timmoon10 changed the title ~~Fix layernorm fsdp~~ [PyTorch] Replace views with reshapes and update PyTorch autocast API Oct 18, 2024

timmoon10 requested changes Oct 18, 2024

View reviewed changes

eljandoubi force-pushed the fix_layernorm_fsdp branch 5 times, most recently from 5cb5e2e to b53d398 Compare October 18, 2024 07:10

Use reshape instead of view and update PyTorch autocast API

8ce50b3

Signed-off-by: eljandoubi <[email protected]>

timmoon10 force-pushed the fix_layernorm_fsdp branch from b53d398 to 8ce50b3 Compare October 18, 2024 18:07

timmoon10 closed this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Replace views with reshapes and update PyTorch autocast API #1250

[PyTorch] Replace views with reshapes and update PyTorch autocast API #1250

eljandoubi commented Oct 14, 2024 •

edited

Loading

timmoon10 left a comment •

edited

Loading

timmoon10 Oct 16, 2024

eljandoubi Oct 16, 2024

timmoon10 Oct 16, 2024

timmoon10 Oct 16, 2024

timmoon10 commented Oct 17, 2024

timmoon10 left a comment

timmoon10 left a comment

timmoon10 commented Oct 18, 2024 •

edited

Loading

[PyTorch] Replace views with reshapes and update PyTorch autocast API #1250

[PyTorch] Replace views with reshapes and update PyTorch autocast API #1250

Conversation

eljandoubi commented Oct 14, 2024 • edited Loading

Description

Type of change

timmoon10 left a comment • edited Loading

Choose a reason for hiding this comment

timmoon10 Oct 16, 2024

Choose a reason for hiding this comment

eljandoubi Oct 16, 2024

Choose a reason for hiding this comment

timmoon10 Oct 16, 2024

Choose a reason for hiding this comment

timmoon10 Oct 16, 2024

Choose a reason for hiding this comment

timmoon10 commented Oct 17, 2024

timmoon10 left a comment

Choose a reason for hiding this comment

timmoon10 left a comment

Choose a reason for hiding this comment

timmoon10 commented Oct 18, 2024 • edited Loading

eljandoubi commented Oct 14, 2024 •

edited

Loading

timmoon10 left a comment •

edited

Loading

timmoon10 commented Oct 18, 2024 •

edited

Loading