[PyTorch] Refactor parameter splitting in Linear and LayerNormLinear #590

timmoon10 · 2024-01-05T08:02:20Z

#533 reports that TransformerLayer doesn't work out-of-the-box with tensor parallelism. The root cause is because the logic for parameter splitting (e.g. for QKV matrices) does not handle tensor parallelism. We've also had another user run into trouble when trying to set parameters_split in Linear because it currently expects the split names to have exactly one underscore at the end (so mysplit and my_split_ would both fail).

I think this is a good opportunity to refactor this logic:

Adjust parameter split size as needed for tensor parallelism
Generalize support for split names. To maintain backward compatibility, we now strip all trailing underscores before appending _weight or _bias, resulting in parameter names like q_weight, etc.
Separate the noop_cat operation so it is independent from the TE modules.

Closes #533.

Remove module state from noop_cat. Support arbitrary names in parameter split. Handle tensor parallelism. Signed-off-by: Tim Moon <[email protected]>

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-01-05T08:02:32Z

/te-ci pytorch

Fix pylint complaints. Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2024-01-05T19:29:15Z

/te-ci pytorch

Signed-off-by: Tim Moon <[email protected]>

cyanguwa

Would these changes affect when people try to load existing checkpoints?

timmoon10 · 2024-01-06T01:42:32Z

I don't think so. The resulting param names (q_weight, k_weight, v_weight) are unchanged, and I don't think the actual values in parameters_split are involved in checkpointing.

timmoon10 added 3 commits January 5, 2024 06:53

Refactor parameter split in Linear module

bbe08be

Remove module state from noop_cat. Support arbitrary names in parameter split. Handle tensor parallelism. Signed-off-by: Tim Moon <[email protected]>

Make noop_cat a standalone operation

e881095

Signed-off-by: Tim Moon <[email protected]>

Update parameter splits in LayerNormLinear

f6c8ca8

Signed-off-by: Tim Moon <[email protected]>

timmoon10 added the bug Something isn't working label Jan 5, 2024

timmoon10 requested review from cyanguwa and ksivaman January 5, 2024 08:02

timmoon10 mentioned this pull request Jan 5, 2024

multi-gpu example with >1 GPU crashes without fuse_qkv=True #533

Closed

timmoon10 added 2 commits January 5, 2024 19:28

Debug case without bias

fbc3615

Fix pylint complaints. Signed-off-by: Tim Moon <[email protected]>

Merge branch 'main' into parameters_split_refactor

49f7e71

Remove unused import

bcec137

Signed-off-by: Tim Moon <[email protected]>

cyanguwa approved these changes Jan 5, 2024

View reviewed changes

timmoon10 merged commit bb759ad into NVIDIA:main Jan 8, 2024

timmoon10 deleted the parameters_split_refactor branch January 8, 2024 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Refactor parameter splitting in Linear and LayerNormLinear #590

[PyTorch] Refactor parameter splitting in Linear and LayerNormLinear #590

Uh oh!

timmoon10 commented Jan 5, 2024 •

edited

Loading

Uh oh!

timmoon10 commented Jan 5, 2024

Uh oh!

timmoon10 commented Jan 5, 2024

Uh oh!

cyanguwa left a comment

Uh oh!

timmoon10 commented Jan 6, 2024

Uh oh!

Uh oh!

[PyTorch] Refactor parameter splitting in Linear and LayerNormLinear #590

[PyTorch] Refactor parameter splitting in Linear and LayerNormLinear #590

Uh oh!

Conversation

timmoon10 commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 commented Jan 5, 2024

Uh oh!

timmoon10 commented Jan 5, 2024

Uh oh!

cyanguwa left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Jan 6, 2024

Uh oh!

Uh oh!

timmoon10 commented Jan 5, 2024 •

edited

Loading