Just to see the diff #3

Muennighoff · 2022-12-08T11:24:30Z

No description provided.

loubnabnl

Looks great! I just left a comment on the order of the losses and the batch size we can fit

loubnabnl · 2022-12-08T13:40:44Z

examples/research_projects/codeparrot/scripts/codeparrot_training.py

+            batch = {k: v[:args.train_batch_size_select] for k,v in batch.items()}
+        elif args.selection_method == "rholoss":
+            with torch.no_grad():
+                out = model(batch, labels=batch, use_cache=False).loss


Comment about the batch size: we're assuming that we can fit a batch size of 320 with our workers, but I think we can only fit 12 sequences on A100 40GB (so on 16 workers: batch of 16*12=192).

So we should probably either incorporate gradient accumulation and store the losses for 2 iterations (2 * 10 (small bz) * 16 gpus=320) or we can change the batch sizes from 320/32 to something that suits us with a 10% ratio like 160/16. In the paper they just talk about the 10% ratio but I'm not sure if using large batches si also important?

There are no gradients here which means that a) we can likely fit a bigger batch size than 12 b) instead of grad acc. we can just run multiple times right after another & store the losses if it doesnt fit

yes right! by grad acc. I also meant doing similar iterations over the losses

examples/research_projects/codeparrot/scripts/codeparrot_training.py

…ing.py Co-authored-by: Loubna Ben Allal <[email protected]>

…nd update accelerate

loubnabnl · 2022-12-12T12:13:08Z

Summary of the changes I added:

fix device & size mismathes, (e.g len(dataloader) doesn't work on iterable dataset + take into account the number of processes)
in transformers the forward pass of GPT2 returns the average loss over the entire batch, and the loss.repeat(batch_size) before calling accelerate.gather
was just repeating that average so we would've ended up selecting the same value. I changed this line in
gpt2_modeling of transformers to add reduction="none" (see requirements). This will return the loss of each token so average to get loss per sequence.
added sanity checks on the order of irreducible losses (this requires using the same batch size.. during their computation and when loading them)
after selecting a small batch size (32) the same batch would be in all workers so no need to gather values, one thing we will need to add is split the batch (32) to 2 or 3 equal chunks then do grad acc because 32 won't fit in one worker.

Muennighoff · 2022-12-12T14:35:07Z

after selecting a small batch size (32) the same batch would be in all workers so no need to gather values, one thing we will need to add is split the batch (32) to 2 or 3 equal chunks then do grad acc because 32 won't fit in one worker.

Amazing work - do you want me to add the last point you mentioned?

loubnabnl · 2022-12-12T17:12:05Z

You can add it you have time, otherwise I will add it later 🤗

Muennighoff · 2022-12-12T17:53:45Z

You can add it you have time, otherwise I will add it later 🤗

Done, but not tested. May have a bug 👻

* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule #3

* added flash attention for opt * added to list * fix use cache (#3) * style fix * fix text * test fix2 * reverted until 689f599 * torch fx tests are working now! * small fix * added TODO docstring * changes * comments and .md file modification --------- Co-authored-by: Younes Belkada <[email protected]>

* Add a convenience method for building in your own name scope * Second attempt at auto layer building * Revert "Second attempt at auto layer building" This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be. * Attempt #3 * Revert "Attempt #3" This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47. * Add missing attributes that we're going to need later * Add some attributes we're going to need later * A fourth attempt! Feel the power flow through you! * Revert "A fourth attempt! Feel the power flow through you!" This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7. * Add more values we'll need later * TF refactor that we'll need later * Revert "TF refactor that we'll need later" This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9. * Revert "Revert "TF refactor that we'll need later"" This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13. * make fixup * Attempt five! * Revert "Attempt five!" This reverts commit 3302207958dfd0374b0447a51c06eea51a506044. * Attempt six - this time don't add empty methods * Revert "Attempt six - this time don't add empty methods" This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb. * Attempt seven - better base model class detection! * Revert "Attempt seven - better base model class detection!" This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9. * Another attribute we'll need later * Try again with the missing attribute! * Revert "Try again with the missing attribute!" This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7. * This is the attempt that will pierce the heavens! * Revert "This is the attempt that will pierce the heavens!" This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6. * Attempt seven - snag list is steadily decreasing * Revert "Attempt seven - snag list is steadily decreasing" This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316. * Attempt eight - will an empty snag list do it? * Revert "Attempt eight - will an empty snag list do it?" This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b. * Fixes to Hubert issues that cause problems later * Trying again with Conv1D/SeparableConv fixes * Revert "Trying again with Conv1D/SeparableConv fixes" This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e. * Apply the build shape fixes to Wav2Vec2 as well * One more attempt! * Revert "One more attempt!" This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c. * Another attempt! * Revert "Another attempt!" This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50. * Let's see how many failures we get without the internal build method * Fix OpenAI * Fix MobileBERT * (Mostly) fix GroupVIT * Fix BLIP * One more BLIP fix * One more BLIP fix! * Fix Regnet * Finally fully fix GroupViT * Fix Data2Vec and add the new AdaptivePool * Fix Segformer * Fix Albert * Fix Deberta/DebertaV2 * Fix XLM * Actually fix XLM * Fix Flaubert * Fix lxmert * Fix Resnet * Fix ConvBERT * Fix ESM * Fix Convnext / ConvnextV2 * Fix SAM * Fix Efficientformer * Fix LayoutLMv3 * Fix speech_to_text * Fix mpnet and mobilevit * Fix Swin * Fix CTRL * Fix CVT * Fix DPR * Fix Wav2Vec2 * Fix T5 * Fix Hubert * Fix GPT2 * Fix Whisper * Fix DeiT * Fix the encoder-decoder / dual-encoder classes * make fix-copies * build in name scope * Fix summarization test * Fix tied weight names for BART + Blenderbot * Fix tied weight name building * Fix to TFESM weight building * Update TF SAM * Expand all the shapes out into Big Boy Shapes

Add RHO-Loss

57db623

loubnabnl reviewed Dec 8, 2022

View reviewed changes

examples/research_projects/codeparrot/scripts/codeparrot_training.py Show resolved Hide resolved

Muennighoff added 4 commits December 8, 2022 18:16

Separate valid data

95d2009

Train test split

9432ee2

Add nostreaming kwarg

48e29c6

Deactivate shuffling

6d0a5e5

loubnabnl reviewed Dec 9, 2022

View reviewed changes

examples/research_projects/codeparrot/scripts/codeparrot_training.py Outdated Show resolved Hide resolved

loubnabnl reviewed Dec 9, 2022

View reviewed changes

examples/research_projects/codeparrot/scripts/codeparrot_training.py Outdated Show resolved Hide resolved

loubnabnl reviewed Dec 9, 2022

View reviewed changes

examples/research_projects/codeparrot/scripts/codeparrot_training.py Outdated Show resolved Hide resolved

Muennighoff and others added 8 commits December 9, 2022 15:04

Update examples/research_projects/codeparrot/scripts/codeparrot_train…

a5bd7d4

…ing.py Co-authored-by: Loubna Ben Allal <[email protected]>

Update examples/research_projects/codeparrot/scripts/codeparrot_train…

9af45a8

…ing.py Co-authored-by: Loubna Ben Allal <[email protected]>

Update examples/research_projects/codeparrot/scripts/codeparrot_train…

e29ead4

…ing.py Co-authored-by: Loubna Ben Allal <[email protected]>

use a fork of transformers where we use reduction=none in GPT2 loss a…

197a23a

…nd update accelerate

add sanity check on order of irred losses when loaded wrt batches

7009225

fix issues in code and add sanity checks

0c8550e

clean file and move some utility function to another file

bbf8e46

reformat utils files

013e006

Compute no grad RHO-Losses in multiple iterations

2887158

jlamypoirier pushed a commit that referenced this pull request Apr 13, 2023

Typos/fixes to link syntax (#21450)

28ec07d

* Typos/fixes to link syntax * Trying section headers * Add header formatting for Rule #3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just to see the diff #3

Just to see the diff #3

Muennighoff commented Dec 8, 2022

loubnabnl left a comment

loubnabnl Dec 8, 2022

Muennighoff Dec 8, 2022

loubnabnl Dec 8, 2022

loubnabnl commented Dec 12, 2022 •

edited

Loading

Muennighoff commented Dec 12, 2022

loubnabnl commented Dec 12, 2022

Muennighoff commented Dec 12, 2022

Just to see the diff #3

Are you sure you want to change the base?

Just to see the diff #3

Conversation

Muennighoff commented Dec 8, 2022

loubnabnl left a comment

Choose a reason for hiding this comment

loubnabnl Dec 8, 2022

Choose a reason for hiding this comment

Muennighoff Dec 8, 2022

Choose a reason for hiding this comment

loubnabnl Dec 8, 2022

Choose a reason for hiding this comment

loubnabnl commented Dec 12, 2022 • edited Loading

Muennighoff commented Dec 12, 2022

loubnabnl commented Dec 12, 2022

Muennighoff commented Dec 12, 2022

loubnabnl commented Dec 12, 2022 •

edited

Loading