Pull change from upstream #1

chuanli11 · 2023-11-28T21:53:47Z

No description provided.

* Adding fine-tuning support for Llama on the client side * fix format * resolve conflict --------- Co-authored-by: Minjia Zhang <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>

This PR adds a step 2 sweeping script in DS Chat and cleans up the existing step 1 and 3 scripts.

Co-authored-by: Lev Kurilenko <[email protected]>

This PR updates the Stable Diffusion example with the following changes: Ability to switch between HF and local pipeline via use_local_pipe arg Added optional name arg to be able to specify model name from command line (default is prompthero/midjourney-v4-diffusion) Fixed local_rank issue by adding argument to script. Only enable CUDA graph when the local pipeline isn't used (not working w/ local pipeline at the moment)

This PR updates the transformers requirement in DS Chat to transformers>=4.31.0.

This PR fixes the OUTPUT arg in the step 3 llama script to use the proper arg position.

Co-authored-by: Logan Adams <[email protected]>

Co-authored-by: Ammar Ahmad Awan <[email protected]>

Co-authored-by: Lev Kurilenko <[email protected]>

Co-authored-by: jagane-infinstor <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>

This PR adds two test mode arguments to DS Chat step 3 training: 1. enable_test_mode - Enable a testing mode that terminates training based on args.test_stop_step 2. test_stop_step - Training step at which to terminate training during testin

This PR adds an explicit LoRA learning rate argument for DS Chat steps 1 through 3. Step 1: - lora_learning_rate Step 2: - lora_learning_rate Step 3: - actor_lora_learning_rate - critic_lora_learning_rate

#651) * support bf16 and CPU accelerator * support both bfloat16 and fp16 data type * change default data type to bf16 to help run this demo on both CPU and GPU * enable HelloDeepSpeed for non-CUDA device * revert changes for sh output * allow select bf16/fp16 datatype * revert unnecessary changes * seperate bf16 and fp16 config --------- Co-authored-by: Olatunji Ruwase <[email protected]>

This PR adds a Step 3 DS Chat Unit Test.

This PR removes skipping of the zero_stage == "3" and hybrid_engine == "true" and offload == "true" and lora == "true" case since the training instability was determined to be transient. This case is now supported by DS Chat step 3 training and is tested in the DS Chat PyTest.

This PR adds the DS-Chat CI badge and documentation to the main and DS-Chat READMEs.

This PR renames some DS-Chat step 3 arguments for clarification. Args renamed: 1. --per_device_train_batch_size --> rename to --per_device_generation_batch_size 2. --per_device_mini_train_batch_size --> rename to --per_device_training_batch_size 3. --generation_batch_numbers --> rename to generation_batches

Co-authored-by: Lev Kurilenko <[email protected]>

Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Zhewei Yao <[email protected]>

Co-authored-by: Olatunji Ruwase <[email protected]>

venv/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( Signed-off-by: Songlin Jiang <[email protected]> Co-authored-by: Logan Adams <[email protected]>

* enable reward model offloading option * fixed code formatting * more formatting fixes * Pre-commit formatting fix --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Logan Adams <[email protected]>

Not all pretrained LLMs use `<|endoftext|>` as the `eot_token`, therefore it's inappropriate to fix it. Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

* add domino * use transformer from deepspeed * clean args * mega opt * add opt & timer * add opt * fix loss * folder name * Change arguent in pretrain script * Add readme for domino * Update readme for domino * Fixing usage issues * update dataset * megatron dependencies * path * Update README.md * remove imports * update import * Update README.md * Minor example script changes * train bash * require * Update README.md --------- Co-authored-by: chengming-zhang <[email protected]> Co-authored-by: Zheyu SHEN <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

…for Domino (#939)

* add benchmarking for offloading states * fix api names

* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat. * Add training scripts for step2 DPO in DeepSpeed-Chat. * Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat. * Update training scripts of step2 DPO in DeepSpeed-Chat. * Follow upstream fixes. * Update README.md for Step2 DPO finetuning. * Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat. * Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

Signed-off-by: Logan Adams <[email protected]>

* Update weights_only due to change in default in torch>=2.6 Signed-off-by: Logan Adams <[email protected]> * formatting Signed-off-by: Logan Adams <[email protected]> --------- Signed-off-by: Logan Adams <[email protected]>

* moved example from DeepSpeed PR #7104 to this repo * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <[email protected]> * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <[email protected]> * replaced T by S for sequence length * replaced T by S for sequence length * replaced T by S for sequence length * more detailed explanation * --pipeline-num-stages is now a comd line argument * cleaner syntax * Update training/data_efficiency/variable_batch_size_and_lr/README.md --------- Co-authored-by: Olatunji Ruwase <[email protected]>

Co-authored-by: Logan Adams <[email protected]>

Signed-off-by: Hongwei Chen <[email protected]> Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net> Co-authored-by: Logan Adams <[email protected]>

* import files for deepcompile benchmark Signed-off-by: Masahiro Tanaka <[email protected]> * add figures Signed-off-by: Masahiro Tanaka <[email protected]> * add figures Signed-off-by: Masahiro Tanaka <[email protected]> * update document Signed-off-by: Masahiro Tanaka <[email protected]> * fix links to images Signed-off-by: Masahiro Tanaka <[email protected]> * add images Signed-off-by: Masahiro Tanaka <[email protected]> * specify deepspeed version Signed-off-by: Masahiro Tanaka <[email protected]> --------- Signed-off-by: Masahiro Tanaka <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>

* update description of versions for deepcompile * Update to match specific tag name Signed-off-by: Logan Adams <[email protected]> --------- Signed-off-by: Logan Adams <[email protected]> Co-authored-by: Logan Adams <[email protected]>

* update description of versions for deepcompile * fix deepcompile benchmark script Signed-off-by: Masahiro Tanaka <[email protected]> * fix benchmark for z1 Signed-off-by: Masahiro Tanaka <[email protected]> * add options for deepcompile bench Signed-off-by: Masahiro Tanaka <[email protected]> --------- Signed-off-by: Masahiro Tanaka <[email protected]>

* update tp example Signed-off-by: inkcherry <[email protected]> * update Signed-off-by: inkcherry <[email protected]> * add length bench file Signed-off-by: inkcherry <[email protected]> --------- Signed-off-by: inkcherry <[email protected]> Co-authored-by: Hongwei Chen <[email protected]>

…#941 (#974) Signed-off-by: Vensenmu <[email protected]>

* Fast model checkpointing * Support both legacy and serialized formats * Add io_buffer_mb option * Bug fix * Force flush * More model options; Refactor common codes * --gpu option * --half and more flexible options * Add deepspeed.save_checkpoint() * Free ds memory * Improve repro * Double I/O buffer (#56) * Double I/O buffer (#60) * Add checkpoint comparison (#62) * Add checkpoint comparison * Corrected a typo Co-authored-by: Yang Li <[email protected]> * save_checkpoint perf monitoring * Disable checkpoint save on exit * Perf statistics for save_checkpoint (#64) * save_checkpoint perf monitoring * Disable checkpoint save on exit * add logs for a100-80 * add torch* error log with half flag but without fused flag * log for error * local rank arg * Handle local_rank arg (#78) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Single writer option (#79) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Allow missing folder * DP writer refactor * Update for DS; Add GDS Signed-off-by: Olatunji Ruwase <[email protected]> * Integrate GDS into deepspeed_model_save * Rebase fast persist (#184) * Fast model checkpointing * Support both legacy and serialized formats * Add io_buffer_mb option * Bug fix * Force flush * More model options; Refactor common codes * --gpu option * --half and more flexible options * Add deepspeed.save_checkpoint() * Free ds memory * Improve repro * Double I/O buffer (#56) * Double I/O buffer (#60) * Add checkpoint comparison (#62) * Add checkpoint comparison * Corrected a typo Co-authored-by: Yang Li <[email protected]> * save_checkpoint perf monitoring * Disable checkpoint save on exit * Perf statistics for save_checkpoint (#64) * save_checkpoint perf monitoring * Disable checkpoint save on exit * add logs for a100-80 * add torch* error log with half flag but without fused flag * log for error * local rank arg * Handle local_rank arg (#78) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Single writer option (#79) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Allow missing folder * DP writer refactor * Update for DS; Add GDS Signed-off-by: Olatunji Ruwase <[email protected]> * Integrate GDS into deepspeed_model_save --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> * Move folder Signed-off-by: Olatunji Ruwase <[email protected]> * Remove folder Signed-off-by: Olatunji Ruwase <[email protected]> * More cleanup Signed-off-by: Olatunji Ruwase <[email protected]> * torch changes Signed-off-by: Olatunji Ruwase <[email protected]> * sglang+zero_inference * Remove file * Add offload configs * Add pin_memory * Cleanup scripts * SGLang README * Remove file --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Hongwei Chen <[email protected]> Co-authored-by: Zhipeng Wang <[email protected]>

* remove files Signed-off-by: Hongwei Chen <[email protected]> * Update domino example Signed-off-by: Hongwei Chen <[email protected]> * apply review suggestions Signed-off-by: Hongwei Chen <[email protected]> --------- Signed-off-by: Hongwei Chen <[email protected]>

Signed-off-by: Olatunji Ruwase <[email protected]>

Signed-off-by: Hongwei Chen <[email protected]>

Signed-off-by: raviguptaamd <[email protected]>

* Add file extension (#980) Signed-off-by: Hongwei Chen <[email protected]> Signed-off-by: jouw <[email protected]> * fix init weights issue for critic/reward model Signed-off-by: jouw <[email protected]> * Update submodule link to reflect https style (#981) Signed-off-by: raviguptaamd <[email protected]> Signed-off-by: jouw <[email protected]> * fix formatting issue Signed-off-by: jouw <[email protected]> --------- Signed-off-by: Hongwei Chen <[email protected]> Signed-off-by: jouw <[email protected]> Signed-off-by: raviguptaamd <[email protected]> Co-authored-by: Hongwei Chen <[email protected]> Co-authored-by: raviguptaamd <[email protected]>

minjiaz and others added 30 commits August 1, 2023 11:26

Adding fine-tuning support for Llama on the client side (#657)

b371f8f

* Adding fine-tuning support for Llama on the client side * fix format * resolve conflict --------- Co-authored-by: Minjia Zhang <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>

Add step 2 sweep script, clean up scripts (#664)

1a0b896

This PR adds a step 2 sweeping script in DS Chat and cleans up the existing step 1 and 3 scripts.

fix only optimize lora and ack-ckpting compatible (#658)

97a36bc

Co-authored-by: Lev Kurilenko <[email protected]>

Refactor load_hf_tokenizer for use with all 3 steps. (#666)

d652031

Update DS Chat transformers req (#667)

835af7c

This PR updates the transformers requirement in DS Chat to transformers>=4.31.0.

Fix output arg in step 3 llama script (#668)

d2373cd

This PR fixes the OUTPUT arg in the step 3 llama script to use the proper arg position.

Fix typos in README.md (#635)

c7bcbc8

Co-authored-by: Logan Adams <[email protected]>

Update CIFAR example with changes required in torch 1.13 (#676)

0e0e2ef

Update inference test and add ds-hf-compare (#671)

04b1036

Co-authored-by: Ammar Ahmad Awan <[email protected]>

add deepspeed chat arxiv report (#679)

042443d

10-20x faster load checkpoint (for critic/reward model) (#675)

3952c9a

Co-authored-by: Lev Kurilenko <[email protected]>

Fix for cannot import name 'LlamaTokenizerFast' (#647) (#648)

ef87284

Co-authored-by: jagane-infinstor <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>

Update script location and docs for all 3 steps (#681)

f8e25c5

Add test mode args to DS Chat step 3 (#682)

c984c8d

This PR adds two test mode arguments to DS Chat step 3 training: 1. enable_test_mode - Enable a testing mode that terminates training based on args.test_stop_step 2. test_stop_step - Training step at which to terminate training during testin

Add Hybrid Engine mode + move arguments to a file (#684)

78a8e2d

Add LoRA LR for DS Chat steps 1-3 (#685)

f98077d

This PR adds an explicit LoRA learning rate argument for DS Chat steps 1 through 3. Step 1: - lora_learning_rate Step 2: - lora_learning_rate Step 3: - actor_lora_learning_rate - critic_lora_learning_rate

DS Chat Step 3 Unit Test (#677)

80eb89f

This PR adds a Step 3 DS Chat Unit Test.

Change run_training_test.py to test_training.py (#686)

a143850

Add minimalistic naming convention for tests

7a21aea

Add DS-Chat CI badge and documentation (#697)

30e6735

This PR adds the DS-Chat CI badge and documentation to the main and DS-Chat READMEs.

Update requirements.txt

927690e

Mixed Precision ZeRO++ (#689)

32083e5

Co-authored-by: Lev Kurilenko <[email protected]>

add timers and performance metrics (#688)

81a8521

Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Zhewei Yao <[email protected]>

Fix calculations (include critic model) for performance (#706)

bd1c82c

add mixz script (#711)

845ac3e

add updates for the new release. (#712)

4355784

SCheekati and others added 30 commits October 29, 2024 15:46

Fixed mistake in readme (#933)

5a61193

Co-authored-by: Olatunji Ruwase <[email protected]>

Remove the fixed eot_token mechanism for SFT (#927)

eefb0ef

Not all pretrained LLMs use `<|endoftext|>` as the `eot_token`, therefore it's inappropriate to fix it. Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

Update DeepSpeed version requirement to >=0.16.0 in requirements.txt …

be0a0e1

…for Domino (#939)

Example and benchmark of APIs to offload states (#942)

fd79b31

* add benchmarking for offloading states * fix api names

remove-redundant-code (#947)

476f600

Update references to torchvision (#949)

b965b9c

Cleanup CODEOWNERS (#953)

a85b5e6

fix: the json format of the training imagenet configuration file (#954)

8075143

Update references to deepspeedai GH org (#955)

b90ffab

Signed-off-by: Logan Adams <[email protected]>

Fix: Add output_folder parameter and correct print statement (#962)

b623258

Co-authored-by: Logan Adams <[email protected]>

run domino example on amd (#958)

223665c

Signed-off-by: Hongwei Chen <[email protected]> Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net> Co-authored-by: Logan Adams <[email protected]>

update runner image (#968)

7b34e07

fix links (#970)

93ebac3

fix: Fix: Correctly define choices as tuple for reward-model arg Fixes …

86aeab2

…#941 (#974) Signed-off-by: Vensenmu <[email protected]>

Simplify and add README (#978)

28a984e

Signed-off-by: Olatunji Ruwase <[email protected]>

Add file extension (#980)

b99d653

Signed-off-by: Hongwei Chen <[email protected]>

Update submodule link to reflect https style (#981)

4579df3

Signed-off-by: raviguptaamd <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pull change from upstream #1

Pull change from upstream #1

Uh oh!

chuanli11 commented Nov 28, 2023

Uh oh!

Uh oh!

Pull change from upstream #1

Are you sure you want to change the base?

Pull change from upstream #1

Uh oh!

Conversation

chuanli11 commented Nov 28, 2023

Uh oh!

Uh oh!