Fix error of PEFT with disable adapters and FSDP#3001
Fix error of PEFT with disable adapters and FSDP#3001Isalia20 wants to merge 8 commits intohuggingface:mainfrom
Conversation
|
@githubnemo Tagging for review |
|
I tried this got this error |
|
I think that is a different not related to this one error. I had a similar error in my training which no longer occurs after this change |
The merged PR #2856 includes deepspeed/fsdp for multi GPU, maybe that helps already? |
|
@githubnemo I think that should have fixed the bugs, can you run the workflow/review when you get a chance? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hmm, I think error in check_code_quality workflow is not related to this pr |
There was a problem hiding this comment.
Thanks for the updates.
I've ran the test on a multi-gpu machine and got the following error:
Testing disable_adapters()...
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 256, in <module>
[rank1]: main(test_name=args.test, model_id=args.model_id, quant=args.quant)
[rank1]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 237, in main
[rank1]: test_disable_adapters(model_id, quant)
[rank1]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 115, in test_disable_adapters
[rank1]: model.disable_adapters()
[rank1]: File "/home/ubuntu/envs/peft-githubnemo/lib/python3.12/site-packages/transformers/integrations/peft.py", line 724, in disable_adapters
[rank1]: raise ValueError("No adapter loaded. Please load an adapter first.")
[rank1]: ValueError: No adapter loaded. Please load an adapter first.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 256, in <module>
[rank0]: main(test_name=args.test, model_id=args.model_id, quant=args.quant)
[rank0]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 237, in main
[rank0]: test_disable_adapters(model_id, quant)
[rank0]: File "/home/ubuntu/code/peft/githubnemo/tests/training/test_fsdp_adapters.py", line 115, in test_disable_adapters
[rank0]: model.disable_adapters()
[rank0]: File "/home/ubuntu/envs/peft-githubnemo/lib/python3.12/site-packages/transformers/integrations/peft.py", line 724, in disable_adapters
[rank0]: raise ValueError("No adapter loaded. Please load an adapter first.")
[rank0]: ValueError: No adapter loaded. Please load an adapter first.
I also left some comments.
Hmm, I think error in check_code_quality workflow is not related to this pr
Yep, you can ignore that for now.
tests/training/test_fsdp_adapters.py
Outdated
| # Test set_adapter - should not raise | ||
| print_if_process_zero("Testing set_adapter('adapter2')...") | ||
| model.set_adapter("adapter2") | ||
| print_if_process_zero("set_adapter('adapter2') succeeded!") | ||
|
|
||
| # Test switching back | ||
| print_if_process_zero("Testing set_adapter('adapter1')...") | ||
| model.set_adapter("adapter1") | ||
| print_if_process_zero("set_adapter('adapter1') succeeded!") |
There was a problem hiding this comment.
I wonder if the test should mimic a more realistic training scenario, i.e. switching to adapter 1, training a bit, switching to adapter 2, training that.
Thanks for running this. Will take a look comments in the upcoming days and fix it. And probably get a multi gpu machine to test this out properly |
|
Updated the tests and resolved comments and tested on multi gpu device. Should be ready now |
|
Hmm, I think test errors are not related to this PR 🤔 @githubnemo wdyt? |
|
@Isalia20 yep, they're not related. If you merge with main this issue will disappear. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Hi @Isalia20, thanks for tackling this issue and providing a solution and a test. I ran the test on my machine and can confirm that it works. Besides what @githubnemo, could you please check my comments?
tests/training/adapters.py
Outdated
| super().__init__() | ||
| model = AutoModelForCausalLM.from_pretrained( | ||
| model_id, | ||
| torch_dtype=torch.bfloat16, |
There was a problem hiding this comment.
| torch_dtype=torch.bfloat16, | |
| dtype=torch.bfloat16, |
There was a problem hiding this comment.
Let's fix the random seed, just to be sure.
There was a problem hiding this comment.
I set the seed to:
torch.manual_seed(42)
on line 118 already, any other ways I can set it?
There was a problem hiding this comment.
I must have missed that, thanks.
| final_base_weights = get_base_model_weights(model.peft_model) | ||
| final_second_adapter_weights = get_adapter_weights(model.peft_model, "second_adapter") | ||
|
|
||
| verify_weights_unchanged(initial_base_weights, final_base_weights, "Base model") |
There was a problem hiding this comment.
Could you explain why we check if the base weights changed? Is this something we would ever expect?
There was a problem hiding this comment.
We wouldn't expect the base weights to change. Test is to make sure that through this FSDP setup the base weights remain unchanged. It was initially requested by @githubnemo and I also think it's a good thing to check in this test
There was a problem hiding this comment.
Ah I see. Let's add a comment that this is a sanity check.
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the updates @Isalia20. Could you please run make style?
There was a problem hiding this comment.
I must have missed that, thanks.
| final_base_weights = get_base_model_weights(model.peft_model) | ||
| final_second_adapter_weights = get_adapter_weights(model.peft_model, "second_adapter") | ||
|
|
||
| verify_weights_unchanged(initial_base_weights, final_base_weights, "Base model") |
There was a problem hiding this comment.
Ah I see. Let's add a comment that this is a sanity check.
|
Oh my bad, I thought pre commit did that. Let me run it and update |
|
Updated |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for the latest update. The failing CI is unrelated and can thus be ignored.
I re-reviewed the PR and I wonder if the two changes (in enable_adapters and set_adapter) could not be factored out to a single helper function, since the code is pretty similar. Moreover, I think the same logic needs to be applied here:
Lines 490 to 510 in 8f73327
The helper function could be called there as well. WDYT?
Fixes #2977
For tests, I'm not sure if PEFT repo has multiple gpu nodes in CI for testing this. By searching through tests, I couldn't find distributed tests. Would be glad to add it if you point me to where it is/example test