forked from deepspeedai/DeepSpeedExamples
-
Notifications
You must be signed in to change notification settings - Fork 0
Pull change from upstream #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
chuanli11
wants to merge
153
commits into
LambdaLabsML:master
Choose a base branch
from
deepspeedai:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Adding fine-tuning support for Llama on the client side * fix format * resolve conflict --------- Co-authored-by: Minjia Zhang <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>
This PR adds a step 2 sweeping script in DS Chat and cleans up the existing step 1 and 3 scripts.
Co-authored-by: Lev Kurilenko <[email protected]>
This PR updates the Stable Diffusion example with the following changes: Ability to switch between HF and local pipeline via use_local_pipe arg Added optional name arg to be able to specify model name from command line (default is prompthero/midjourney-v4-diffusion) Fixed local_rank issue by adding argument to script. Only enable CUDA graph when the local pipeline isn't used (not working w/ local pipeline at the moment)
This PR updates the transformers requirement in DS Chat to transformers>=4.31.0.
This PR fixes the OUTPUT arg in the step 3 llama script to use the proper arg position.
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: jagane-infinstor <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]>
This PR adds two test mode arguments to DS Chat step 3 training: 1. enable_test_mode - Enable a testing mode that terminates training based on args.test_stop_step 2. test_stop_step - Training step at which to terminate training during testin
This PR adds an explicit LoRA learning rate argument for DS Chat steps 1 through 3. Step 1: - lora_learning_rate Step 2: - lora_learning_rate Step 3: - actor_lora_learning_rate - critic_lora_learning_rate
#651) * support bf16 and CPU accelerator * support both bfloat16 and fp16 data type * change default data type to bf16 to help run this demo on both CPU and GPU * enable HelloDeepSpeed for non-CUDA device * revert changes for sh output * allow select bf16/fp16 datatype * revert unnecessary changes * seperate bf16 and fp16 config --------- Co-authored-by: Olatunji Ruwase <[email protected]>
This PR adds a Step 3 DS Chat Unit Test.
This PR removes skipping of the zero_stage == "3" and hybrid_engine == "true" and offload == "true" and lora == "true" case since the training instability was determined to be transient. This case is now supported by DS Chat step 3 training and is tested in the DS Chat PyTest.
This PR adds the DS-Chat CI badge and documentation to the main and DS-Chat READMEs.
This PR renames some DS-Chat step 3 arguments for clarification. Args renamed: 1. --per_device_train_batch_size --> rename to --per_device_generation_batch_size 2. --per_device_mini_train_batch_size --> rename to --per_device_training_batch_size 3. --generation_batch_numbers --> rename to generation_batches
Co-authored-by: Lev Kurilenko <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Molly Smith <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Zhewei Yao <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
venv/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( Signed-off-by: Songlin Jiang <[email protected]> Co-authored-by: Logan Adams <[email protected]>
* enable reward model offloading option * fixed code formatting * more formatting fixes * Pre-commit formatting fix --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Logan Adams <[email protected]>
Not all pretrained LLMs use `<|endoftext|>` as the `eot_token`, therefore it's inappropriate to fix it. Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>
* add domino * use transformer from deepspeed * clean args * mega opt * add opt & timer * add opt * fix loss * folder name * Change arguent in pretrain script * Add readme for domino * Update readme for domino * Fixing usage issues * update dataset * megatron dependencies * path * Update README.md * remove imports * update import * Update README.md * Minor example script changes * train bash * require * Update README.md --------- Co-authored-by: chengming-zhang <[email protected]> Co-authored-by: Zheyu SHEN <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>
* add benchmarking for offloading states * fix api names
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat. * Add training scripts for step2 DPO in DeepSpeed-Chat. * Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat. * Update training scripts of step2 DPO in DeepSpeed-Chat. * Follow upstream fixes. * Update README.md for Step2 DPO finetuning. * Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat. * Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat. --------- Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
* Update weights_only due to change in default in torch>=2.6 Signed-off-by: Logan Adams <[email protected]> * formatting Signed-off-by: Logan Adams <[email protected]> --------- Signed-off-by: Logan Adams <[email protected]>
* moved example from DeepSpeed PR #7104 to this repo * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <[email protected]> * Update training/data_efficiency/variable_batch_size_and_lr/README.md Co-authored-by: Olatunji Ruwase <[email protected]> * replaced T by S for sequence length * replaced T by S for sequence length * replaced T by S for sequence length * more detailed explanation * --pipeline-num-stages is now a comd line argument * cleaner syntax * Update training/data_efficiency/variable_batch_size_and_lr/README.md --------- Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Signed-off-by: Hongwei Chen <[email protected]> Co-authored-by: Hongwei Chen <hongweichen@ftqtmec25000002.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net> Co-authored-by: Logan Adams <[email protected]>
* import files for deepcompile benchmark Signed-off-by: Masahiro Tanaka <[email protected]> * add figures Signed-off-by: Masahiro Tanaka <[email protected]> * add figures Signed-off-by: Masahiro Tanaka <[email protected]> * update document Signed-off-by: Masahiro Tanaka <[email protected]> * fix links to images Signed-off-by: Masahiro Tanaka <[email protected]> * add images Signed-off-by: Masahiro Tanaka <[email protected]> * specify deepspeed version Signed-off-by: Masahiro Tanaka <[email protected]> --------- Signed-off-by: Masahiro Tanaka <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]>
* update description of versions for deepcompile * Update to match specific tag name Signed-off-by: Logan Adams <[email protected]> --------- Signed-off-by: Logan Adams <[email protected]> Co-authored-by: Logan Adams <[email protected]>
* update description of versions for deepcompile * fix deepcompile benchmark script Signed-off-by: Masahiro Tanaka <[email protected]> * fix benchmark for z1 Signed-off-by: Masahiro Tanaka <[email protected]> * add options for deepcompile bench Signed-off-by: Masahiro Tanaka <[email protected]> --------- Signed-off-by: Masahiro Tanaka <[email protected]>
* update tp example Signed-off-by: inkcherry <[email protected]> * update Signed-off-by: inkcherry <[email protected]> * add length bench file Signed-off-by: inkcherry <[email protected]> --------- Signed-off-by: inkcherry <[email protected]> Co-authored-by: Hongwei Chen <[email protected]>
…#941 (#974) Signed-off-by: Vensenmu <[email protected]>
* Fast model checkpointing * Support both legacy and serialized formats * Add io_buffer_mb option * Bug fix * Force flush * More model options; Refactor common codes * --gpu option * --half and more flexible options * Add deepspeed.save_checkpoint() * Free ds memory * Improve repro * Double I/O buffer (#56) * Double I/O buffer (#60) * Add checkpoint comparison (#62) * Add checkpoint comparison * Corrected a typo Co-authored-by: Yang Li <[email protected]> * save_checkpoint perf monitoring * Disable checkpoint save on exit * Perf statistics for save_checkpoint (#64) * save_checkpoint perf monitoring * Disable checkpoint save on exit * add logs for a100-80 * add torch* error log with half flag but without fused flag * log for error * local rank arg * Handle local_rank arg (#78) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Single writer option (#79) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Allow missing folder * DP writer refactor * Update for DS; Add GDS Signed-off-by: Olatunji Ruwase <[email protected]> * Integrate GDS into deepspeed_model_save * Rebase fast persist (#184) * Fast model checkpointing * Support both legacy and serialized formats * Add io_buffer_mb option * Bug fix * Force flush * More model options; Refactor common codes * --gpu option * --half and more flexible options * Add deepspeed.save_checkpoint() * Free ds memory * Improve repro * Double I/O buffer (#56) * Double I/O buffer (#60) * Add checkpoint comparison (#62) * Add checkpoint comparison * Corrected a typo Co-authored-by: Yang Li <[email protected]> * save_checkpoint perf monitoring * Disable checkpoint save on exit * Perf statistics for save_checkpoint (#64) * save_checkpoint perf monitoring * Disable checkpoint save on exit * add logs for a100-80 * add torch* error log with half flag but without fused flag * log for error * local rank arg * Handle local_rank arg (#78) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Single writer option (#79) * save_checkpoint perf monitoring * Disable checkpoint save on exit * local rank arg * Single writer option * Allow missing folder * DP writer refactor * Update for DS; Add GDS Signed-off-by: Olatunji Ruwase <[email protected]> * Integrate GDS into deepspeed_model_save --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> * Move folder Signed-off-by: Olatunji Ruwase <[email protected]> * Remove folder Signed-off-by: Olatunji Ruwase <[email protected]> * More cleanup Signed-off-by: Olatunji Ruwase <[email protected]> * torch changes Signed-off-by: Olatunji Ruwase <[email protected]> * sglang+zero_inference * Remove file * Add offload configs * Add pin_memory * Cleanup scripts * SGLang README * Remove file --------- Signed-off-by: Olatunji Ruwase <[email protected]> Co-authored-by: jerryyangli <[email protected]> Co-authored-by: Yang Li <[email protected]> Co-authored-by: GuanhuaWang <[email protected]> Co-authored-by: Logan Adams <[email protected]> Co-authored-by: Hongwei Chen <[email protected]> Co-authored-by: Zhipeng Wang <[email protected]>
* remove files Signed-off-by: Hongwei Chen <[email protected]> * Update domino example Signed-off-by: Hongwei Chen <[email protected]> * apply review suggestions Signed-off-by: Hongwei Chen <[email protected]> --------- Signed-off-by: Hongwei Chen <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Hongwei Chen <[email protected]>
Signed-off-by: raviguptaamd <[email protected]>
* Add file extension (#980) Signed-off-by: Hongwei Chen <[email protected]> Signed-off-by: jouw <[email protected]> * fix init weights issue for critic/reward model Signed-off-by: jouw <[email protected]> * Update submodule link to reflect https style (#981) Signed-off-by: raviguptaamd <[email protected]> Signed-off-by: jouw <[email protected]> * fix formatting issue Signed-off-by: jouw <[email protected]> --------- Signed-off-by: Hongwei Chen <[email protected]> Signed-off-by: jouw <[email protected]> Signed-off-by: raviguptaamd <[email protected]> Co-authored-by: Hongwei Chen <[email protected]> Co-authored-by: raviguptaamd <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.