Remove FSDP wrapping from sub-models. #34452

eljandoubi · 2024-10-28T00:32:33Z

What does this PR do?

Fixes #34113

Who can review?

Library:

trainer: @muellerzr and @SunMarc

SunMarc

Thnaks for fixing the issue @eljandoubi ! Do you think there is a simpler way to handle this edge case @muellerzr ?

SunMarc · 2024-10-28T01:26:57Z

src/transformers/trainer.py

+            # Remove FSDP wrapping from sub-models.
+            self.model = extract_model_from_parallel(self.model, recursive=True)
+


You can use unwarp_model function in transformers instead. Also, why do we need to set recursive to True ? Also, please leave a comment above as this specific path is only to make it functional with auto_find_batch_size .

unwrap_model does not provide access to the recursive argument. Auto-wrap policies wrap submodules with FSDP, and unwrap_model is unable to remove them. You can test this on the toy example from the PyTorch FSDP tutorial for rank=0 and world_size=1, then experiment with the line I provided in a notebook.

my_auto_wrap_policy = functools.partial( size_based_auto_wrap_policy, min_num_params=20000 ) torch.cuda.set_device(rank) model = Net().to(rank) print(model) fsdp_model = FSDP(model, auto_wrap_policy=my_auto_wrap_policy) print(fsdp_model) unwrap_model = unwarp_model(fsdp_model) print(unwrap_model)

VS
You need to reinstantiates model and fsdp_model:

model = Net().to(rank) fsdp_model = FSDP(model, auto_wrap_policy=my_auto_wrap_policy) extract_model = extract_model_from_parallel(fsdp_model, recursive=True) print(extract_model)

I'm talking about this function in transformers. It uses extract_model_from_parallel under the hood so it should be comparable.

eljandoubi · 2024-10-29T18:26:40Z

@SunMarc @muellerzr Did you get a different result than I did?

muellerzr

Thanks for the fix, can you add a test in tests/test_trainer.py for this?

HuggingFaceDocBuilderDev · 2024-10-30T22:45:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks ! Left an suggestion for unwrap_model

eljandoubi · 2024-11-04T19:08:31Z

@SunMarc I migrated to unwrap_model.

LysandreJik

Let's merge it if you're both ok with it @SunMarc @muellerzr

SunMarc · 2024-11-05T17:28:32Z

Please rebase this PR on main in order to pass the CI @eljandoubi !

SunMarc reviewed Oct 28, 2024

View reviewed changes

muellerzr approved these changes Oct 30, 2024

View reviewed changes

muellerzr requested a review from ArthurZucker October 30, 2024 22:21

muellerzr requested a review from LysandreJik October 31, 2024 13:55

SunMarc approved these changes Nov 4, 2024

View reviewed changes

LysandreJik approved these changes Nov 5, 2024

View reviewed changes

ArthurZucker removed their request for review November 5, 2024 12:41

eljandoubi and others added 6 commits November 5, 2024 21:00

Remove FSDP wrapping from sub-models.

ab9934e

solve conflict trainer.py

d518f76

make fixup

e76dfd0

add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size

a6dcb92

put back extract_model_from_parallel

7ebb2c6

use transformers unwrap_model

0df20d6

eljandoubi force-pushed the fix_fsdp_auto_wrap_policy branch from 693ba36 to 0df20d6 Compare November 5, 2024 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove FSDP wrapping from sub-models. #34452

Remove FSDP wrapping from sub-models. #34452

eljandoubi commented Oct 28, 2024

SunMarc left a comment

SunMarc Oct 28, 2024 •

edited

Loading

eljandoubi Oct 28, 2024 •

edited

Loading

SunMarc Nov 4, 2024 •

edited

Loading

eljandoubi Nov 4, 2024

eljandoubi commented Oct 29, 2024

muellerzr left a comment

HuggingFaceDocBuilderDev commented Oct 30, 2024

SunMarc left a comment

eljandoubi commented Nov 4, 2024

LysandreJik left a comment

SunMarc commented Nov 5, 2024

		# Remove FSDP wrapping from sub-models.
		self.model = extract_model_from_parallel(self.model, recursive=True)

Remove FSDP wrapping from sub-models. #34452

Are you sure you want to change the base?

Remove FSDP wrapping from sub-models. #34452

Conversation

eljandoubi commented Oct 28, 2024

What does this PR do?

Who can review?

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

eljandoubi Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

SunMarc Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

eljandoubi Nov 4, 2024

Choose a reason for hiding this comment

eljandoubi commented Oct 29, 2024

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 30, 2024

SunMarc left a comment

Choose a reason for hiding this comment

eljandoubi commented Nov 4, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

SunMarc commented Nov 5, 2024

SunMarc Oct 28, 2024 •

edited

Loading

eljandoubi Oct 28, 2024 •

edited

Loading

SunMarc Nov 4, 2024 •

edited

Loading