Skip to content

Pull change from upstream #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 153 commits into
base: master
Choose a base branch
from

Conversation

chuanli11
Copy link
Collaborator

No description provided.

minjiaz and others added 30 commits August 1, 2023 11:26
* Adding fine-tuning support for Llama on the client side

* fix format

* resolve conflict

---------

Co-authored-by: Minjia Zhang <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>
This PR adds a step 2 sweeping script in DS Chat and cleans up the existing step 1 and 3 scripts.
This PR updates the Stable Diffusion example with the following changes:

Ability to switch between HF and local pipeline via use_local_pipe arg
Added optional name arg to be able to specify model name from command line (default is prompthero/midjourney-v4-diffusion)
Fixed local_rank issue by adding argument to script.
Only enable CUDA graph when the local pipeline isn't used (not working w/ local pipeline at the moment)
This PR updates the transformers requirement in DS Chat to transformers>=4.31.0.
This PR fixes the OUTPUT arg in the step 3 llama script to use the proper arg position.
Co-authored-by: jagane-infinstor <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>
This PR adds two test mode arguments to DS Chat step 3 training:

1. enable_test_mode - Enable a testing mode that terminates training based on args.test_stop_step
2. test_stop_step - Training step at which to terminate training during testin
This PR adds an explicit LoRA learning rate argument for DS Chat steps 1 through 3.

Step 1:
- lora_learning_rate

Step 2:
- lora_learning_rate

Step 3:
- actor_lora_learning_rate
- critic_lora_learning_rate
#651)

* support bf16 and CPU accelerator

* support both bfloat16 and fp16 data type

* change default data type to bf16 to help run this demo on both CPU and GPU

* enable HelloDeepSpeed for non-CUDA device

* revert changes for sh output

* allow select bf16/fp16 datatype

* revert unnecessary changes

* seperate bf16 and fp16 config

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
This PR adds a Step 3 DS Chat Unit Test.
This PR removes skipping of the zero_stage == "3" and hybrid_engine == "true" and offload == "true" and lora == "true" case since the training instability was determined to be transient. This case is now supported by DS Chat step 3 training and is tested in the DS Chat PyTest.
This PR adds the DS-Chat CI badge and documentation to the main and DS-Chat READMEs.
This PR renames some DS-Chat step 3 arguments for clarification.

Args renamed:
1. --per_device_train_batch_size --> rename to --per_device_generation_batch_size
2. --per_device_mini_train_batch_size --> rename to --per_device_training_batch_size
3. --generation_batch_numbers --> rename to generation_batches
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>
Co-authored-by: Molly Smith <[email protected]>
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: Zhewei Yao <[email protected]>
SCheekati and others added 30 commits October 29, 2024 15:46
Co-authored-by: Olatunji Ruwase <[email protected]>
venv/lib/python3.10/site-packages/transformers/deepspeed.py:23:
FutureWarning: transformers.deepspeed module is deprecated and
will be removed in a future version. Please import deepspeed
modules directly from transformers.integrations
  warnings.warn(

Signed-off-by: Songlin Jiang <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
* enable reward model offloading option

* fixed code formatting

* more formatting fixes

* Pre-commit formatting fix

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Not all pretrained LLMs use `<|endoftext|>` as the `eot_token`, therefore it's inappropriate to fix it.

Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
* add domino

* use transformer from deepspeed

* clean args

* mega opt

* add opt & timer

* add opt

* fix loss

* folder name

* Change arguent in pretrain script

* Add readme for domino

* Update readme for domino

* Fixing usage issues

* update dataset

* megatron dependencies

* path

* Update README.md

* remove imports

* update import

* Update README.md

* Minor example script changes

* train bash

* require

* Update README.md

---------

Co-authored-by: chengming-zhang <[email protected]>
Co-authored-by: Zheyu SHEN <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
* add benchmarking for offloading states

* fix api names
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat.

* Add training scripts for step2 DPO in DeepSpeed-Chat.

* Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat.

* Update training scripts of step2 DPO in DeepSpeed-Chat.

* Follow upstream fixes.

* Update README.md for Step2 DPO finetuning.

* Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat.

* Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat.

---------

Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
* Update weights_only due to change in default in torch>=2.6

Signed-off-by: Logan Adams <[email protected]>

* formatting

Signed-off-by: Logan Adams <[email protected]>

---------

Signed-off-by: Logan Adams <[email protected]>
* moved example from DeepSpeed PR #7104 to this repo

* Update training/data_efficiency/variable_batch_size_and_lr/README.md

Co-authored-by: Olatunji Ruwase <[email protected]>

* Update training/data_efficiency/variable_batch_size_and_lr/README.md

Co-authored-by: Olatunji Ruwase <[email protected]>

* replaced T by S for sequence length

* replaced T by S for sequence length

* replaced T by S for sequence length

* more detailed explanation

* --pipeline-num-stages is now a comd line argument

* cleaner syntax

* Update training/data_efficiency/variable_batch_size_and_lr/README.md

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Hongwei Chen <[email protected]>
Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net>
Co-authored-by: Logan Adams <[email protected]>
* import files for deepcompile benchmark

Signed-off-by: Masahiro Tanaka <[email protected]>

* add figures

Signed-off-by: Masahiro Tanaka <[email protected]>

* add figures

Signed-off-by: Masahiro Tanaka <[email protected]>

* update document

Signed-off-by: Masahiro Tanaka <[email protected]>

* fix links to images

Signed-off-by: Masahiro Tanaka <[email protected]>

* add images

Signed-off-by: Masahiro Tanaka <[email protected]>

* specify deepspeed version

Signed-off-by: Masahiro Tanaka <[email protected]>

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
* update description of versions for deepcompile

* Update to match specific tag name

Signed-off-by: Logan Adams <[email protected]>

---------

Signed-off-by: Logan Adams <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
* update description of versions for deepcompile

* fix deepcompile benchmark script

Signed-off-by: Masahiro Tanaka <[email protected]>

* fix benchmark for z1

Signed-off-by: Masahiro Tanaka <[email protected]>

* add options for deepcompile bench

Signed-off-by: Masahiro Tanaka <[email protected]>

---------

Signed-off-by: Masahiro Tanaka <[email protected]>
* update tp example

Signed-off-by: inkcherry <[email protected]>

* update

Signed-off-by: inkcherry <[email protected]>

* add length bench file

Signed-off-by: inkcherry <[email protected]>

---------

Signed-off-by: inkcherry <[email protected]>
Co-authored-by: Hongwei Chen <[email protected]>
* Fast model checkpointing

* Support both legacy and serialized formats

* Add io_buffer_mb option

* Bug fix

* Force flush

* More model options; Refactor common codes

* --gpu option

* --half and more flexible options

* Add deepspeed.save_checkpoint()

* Free ds memory

* Improve repro

* Double I/O buffer (#56)

* Double I/O buffer (#60)

* Add checkpoint comparison (#62)

* Add checkpoint comparison

* Corrected a typo

Co-authored-by: Yang Li <[email protected]>

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* Perf statistics for save_checkpoint (#64)

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* add logs for a100-80

* add torch* error log with half flag but without fused flag

* log for error

* local rank arg

* Handle local_rank arg (#78)

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* local rank arg

* Single writer option

* Single writer option (#79)

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* local rank arg

* Single writer option

* Allow missing folder

* DP writer refactor

* Update for DS; Add GDS

Signed-off-by: Olatunji Ruwase <[email protected]>

* Integrate GDS into deepspeed_model_save

* Rebase fast persist (#184)

* Fast model checkpointing

* Support both legacy and serialized formats

* Add io_buffer_mb option

* Bug fix

* Force flush

* More model options; Refactor common codes

* --gpu option

* --half and more flexible options

* Add deepspeed.save_checkpoint()

* Free ds memory

* Improve repro

* Double I/O buffer (#56)

* Double I/O buffer (#60)

* Add checkpoint comparison (#62)

* Add checkpoint comparison

* Corrected a typo

Co-authored-by: Yang Li <[email protected]>

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* Perf statistics for save_checkpoint (#64)

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* add logs for a100-80

* add torch* error log with half flag but without fused flag

* log for error

* local rank arg

* Handle local_rank arg (#78)

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* local rank arg

* Single writer option

* Single writer option (#79)

* save_checkpoint perf monitoring

* Disable checkpoint save on exit

* local rank arg

* Single writer option

* Allow missing folder

* DP writer refactor

* Update for DS; Add GDS

Signed-off-by: Olatunji Ruwase <[email protected]>

* Integrate GDS into deepspeed_model_save

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Co-authored-by: jerryyangli <[email protected]>
Co-authored-by: Yang Li <[email protected]>
Co-authored-by: GuanhuaWang <[email protected]>

* Move folder

Signed-off-by: Olatunji Ruwase <[email protected]>

* Remove folder

Signed-off-by: Olatunji Ruwase <[email protected]>

* More cleanup

Signed-off-by: Olatunji Ruwase <[email protected]>

* torch changes

Signed-off-by: Olatunji Ruwase <[email protected]>

* sglang+zero_inference

* Remove file

* Add offload configs

* Add pin_memory

* Cleanup scripts

* SGLang README

* Remove file

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Co-authored-by: jerryyangli <[email protected]>
Co-authored-by: Yang Li <[email protected]>
Co-authored-by: GuanhuaWang <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Hongwei Chen <[email protected]>
Co-authored-by: Zhipeng Wang <[email protected]>
* remove files

Signed-off-by: Hongwei Chen <[email protected]>

* Update domino example

Signed-off-by: Hongwei Chen <[email protected]>

* apply review suggestions

Signed-off-by: Hongwei Chen <[email protected]>

---------

Signed-off-by: Hongwei Chen <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Hongwei Chen <[email protected]>
* Add file extension (#980)

Signed-off-by: Hongwei Chen <[email protected]>
Signed-off-by: jouw <[email protected]>

* fix init weights issue for critic/reward model

Signed-off-by: jouw <[email protected]>

* Update submodule link to reflect https style (#981)

Signed-off-by: raviguptaamd <[email protected]>
Signed-off-by: jouw <[email protected]>

* fix formatting issue

Signed-off-by: jouw <[email protected]>

---------

Signed-off-by: Hongwei Chen <[email protected]>
Signed-off-by: jouw <[email protected]>
Signed-off-by: raviguptaamd <[email protected]>
Co-authored-by: Hongwei Chen <[email protected]>
Co-authored-by: raviguptaamd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.