turingmotors · chizuchizu · Oct 19, 2023 · Oct 19, 2023 · Oct 19, 2023 · Oct 19, 2023
diff --git a/README.md b/README.md
@@ -143,8 +143,74 @@ To start learning, execute the following command.
 
 GPU is required for learning; we have tested on Ubuntu 20.04, CUDA 11.7.
 
+# Training (w/o Trainer)
+
+We offer `train_ds.py`, a training script independent of Hugging Face's `Trainer` class for more flexible learning configurations.
+For example, the contents of [projects/opt/exp002_ds.yml](projects/opt/exp002_ds.yml) has the following contents:
+
+```yaml
+training_config:
+  per_device_train_batch_size: 2
+  per_device_eval_batch_size: 2
+  gradient_accumulation_steps: 4
+  num_train_epochs: 5
+  dataloader_num_workers: 16
+  learning_rate: 5.0e-5
+  output_dir: ./output/
+  report_to: "wandb"
+  zero_stage: 2
+  precision: "fp16"
+  enable_tensorboard: False
+  seed: 0
+  weight_decay: 0.
+  learning_rate_pretraining_components: 0.
+  num_warmup_steps: 0.
+  optim_betas:
+    - 0.9
+    - 0.95
+  lr_scheduler_type: "cosine"
+  gradient_checkpointing: False
+  cpu_offload: False
+
+
+model_config:
+  pretrained_path: # None or path to model weight
+  model_type: git_llm
+  language_model_name: facebook/opt-125m
+  vision_model_name: openai/clip-vit-base-patch16
+  num_image_with_embedding: 1 # if 1, no img_temporal_embedding
+  max_length: 512
+  keys_to_finetune:
+    - visual_projection
+    - num_image_with_embedding
+  keys_to_freeze: []
+
+  # TODO: support LoRA
+  # use_lora: false
+  # lora:
+  #   r: 8
+  #   lora_alpha: 32
+  #   target_modules:
+  #     - q_proj
+  #     - k_proj
+  #     - v_proj
+  #   lora_dropout: 0.01
+  #   bias: none
+  #   task_type: CAUSAL_LM
+
+dataset_config_path:
+  - ./configs/datasets/m3it_coco.yaml  # only coco dataset
+```
+
+To start learning, execute the following command.
+
+```bash
+./scripts/run_ds.sh
+```
+
 # Evaluation
 
+If you have the model trained by ZeRO-3
 You can get the pretrained weight form Hugging Face Hub: [turing-motors/heron-chat-git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0)<br>
 See also [notebooks](./notebooks).
 
@@ -197,6 +263,26 @@ with torch.no_grad():
 print(processor.tokenizer.batch_decode(out)[0])
 ```
 
+If you have a model trained using ZeRO-3, it must be modified as follows:
+
+```diff
+- # prepare a pretrained model
+- model = GitLlamaForCausalLM.from_pretrained(
+-     'turing-motors/heron-chat-git-Llama-2-7b-v0', torch_dtype=torch.float16
+- )
++ from heron.models.utils import load_model, load_pretrained_weight
++ import yaml
++ 
++ config_file = f"./projects/opt/exp002_ds.yml"
++ 
++ # get config
++ with open(config_file, "r") as i_:
++     config = yaml.safe_load(i_)
++ 
++ model = load_model(config["model_config"])
++ model.load_state_dict(torch.load('./output/opt/exp002_ds/epoch-1/pytorch_model.bin'), strict=True)
+```
+
 ### Pretrained Models
 
 |model|LLM module|adapter|size|
@@ -225,3 +311,4 @@ Released under the [Apache License 2.0](./LICENSE).
 - [GenerativeImage2Text](https://github.com/microsoft/GenerativeImage2Text): The main idia of the model is based on original GIT.
 - [Llava](https://github.com/haotian-liu/LLaVA): This project is learned a lot from the great Llava project.
 - [GIT-LLM](https://github.com/Ino-Ichan/GIT-LLM)
+- [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples)
diff --git a/configs/datasets/m3it_coco.yaml b/configs/datasets/m3it_coco.yaml
@@ -0,0 +1,3 @@
+dataset_type: m3it
+dataset_names:
+  - coco
diff --git a/docs/README_CN.md b/docs/README_CN.md
@@ -143,6 +143,70 @@ training_config "为训练设置, "model_config "为模型设置，"dataset_conf
 
 学习需要 GPU；我们在 Ubuntu 20.04 和 CUDA 11.7 上对系统进行了测试.
 
+# 学习方法 （不含 Trainer）
+我们提供 `train_ds.py` ，一个独立于Hugging Face训练师类的训练脚本，用于更灵活的学习配置。例如，[projects/opt/exp002_ds.yml](../projects/opt/exp002_ds.yml) 的内容如下：
+
+```yaml
+training_config:
+  per_device_train_batch_size: 2
+  per_device_eval_batch_size: 2
+  gradient_accumulation_steps: 4
+  num_train_epochs: 5
+  dataloader_num_workers: 16
+  learning_rate: 5.0e-5
+  output_dir: ./output/
+  report_to: "wandb"
+  zero_stage: 2
+  precision: "fp16"
+  enable_tensorboard: False
+  seed: 0
+  weight_decay: 0.
+  learning_rate_pretraining_components: 0.
+  num_warmup_steps: 0.
+  optim_betas:
+    - 0.9
+    - 0.95
+  lr_scheduler_type: "cosine"
+  gradient_checkpointing: False
+  cpu_offload: False
+
+
+model_config:
+  pretrained_path: # None or path to model weight
+  model_type: git_llm
+  language_model_name: facebook/opt-125m
+  vision_model_name: openai/clip-vit-base-patch16
+  num_image_with_embedding: 1 # if 1, no img_temporal_embedding
+  max_length: 512
+  keys_to_finetune:
+    - visual_projection
+    - num_image_with_embedding
+  keys_to_freeze: []
+
+  # TODO: support LoRA
+  # use_lora: false
+  # lora:
+  #   r: 8
+  #   lora_alpha: 32
+  #   target_modules:
+  #     - q_proj
+  #     - k_proj
+  #     - v_proj
+  #   lora_dropout: 0.01
+  #   bias: none
+  #   task_type: CAUSAL_LM
+
+dataset_config_path:
+  - ./configs/datasets/m3it_coco.yaml  # only coco dataset
+```
+
+要开始学习, 请执行以下命令.
+
+
+```bash
+./scripts/run_ds.sh
+```
+
 # 如何使用
 
 您可以从 Hugging Face Hub 下载训练好的模型：[turing-motors/heron-chat-git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0)<br>
@@ -195,6 +259,26 @@ with torch.no_grad():
 print(processor.tokenizer.batch_decode(out))
 ```
 
+如果模型是用 ZeRO-3 训练的，请进行以下更改.
+
+```diff
+- # prepare a pretrained model
+- model = GitLlamaForCausalLM.from_pretrained(
+-     'turing-motors/heron-chat-git-Llama-2-7b-v0', torch_dtype=torch.float16
+- )
++ from heron.models.utils import load_model, load_pretrained_weight
++ import yaml
++ 
++ config_file = f"./projects/opt/exp002_ds.yml"
++ 
++ # get config
++ with open(config_file, "r") as i_:
++     config = yaml.safe_load(i_)
++ 
++ model = load_model(config["model_config"])
++ model.load_state_dict(torch.load('./output/opt/exp002_ds/epoch-1/pytorch_model.bin'), strict=True)
+```
+
 ### 训练有素的模型列表
 
 |model|LLM module|adapter|size|
@@ -222,3 +306,4 @@ print(processor.tokenizer.batch_decode(out))
 - [GenerativeImage2Text](https://github.com/microsoft/GenerativeImage2Text)
 - [Llava](https://github.com/haotian-liu/LLaVA)
 - [GIT-LLM](https://github.com/Ino-Ichan/GIT-LLM)
+- [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples)
diff --git a/docs/README_JP.md b/docs/README_JP.md
@@ -142,6 +142,70 @@ dataset_config_path:
 
 学習にはGPUが必要です。Ubuntu20.04, CUDA11.7で動作確認をしています。
 
+# 学習方法 (Trainerなし)
+Hugging Faceの `Trainer` クラスに依存しない訓練スクリプト `train_ds.py` を提供しています。<br>
+例えば、[projects/opt/exp002_ds.yml](../projects/opt/exp_002_ds.yml)の内容は次のようになっています。
+
+```yaml
+training_config:
+  per_device_train_batch_size: 2
+  per_device_eval_batch_size: 2
+  gradient_accumulation_steps: 4
+  num_train_epochs: 5
+  dataloader_num_workers: 16
+  learning_rate: 5.0e-5
+  output_dir: ./output/
+  report_to: "wandb"
+  zero_stage: 2
+  precision: "fp16"
+  enable_tensorboard: False
+  seed: 0
+  weight_decay: 0.
+  learning_rate_pretraining_components: 0.
+  num_warmup_steps: 0.
+  optim_betas:
+    - 0.9
+    - 0.95
+  lr_scheduler_type: "cosine"
+  gradient_checkpointing: False
+  cpu_offload: False
+
+
+model_config:
+  pretrained_path: # None or path to model weight
+  model_type: git_llm
+  language_model_name: facebook/opt-125m
+  vision_model_name: openai/clip-vit-base-patch16
+  num_image_with_embedding: 1 # if 1, no img_temporal_embedding
+  max_length: 512
+  keys_to_finetune:
+    - visual_projection
+    - num_image_with_embedding
+  keys_to_freeze: []
+
+  # TODO: support LoRA
+  # use_lora: false
+  # lora:
+  #   r: 8
+  #   lora_alpha: 32
+  #   target_modules:
+  #     - q_proj
+  #     - k_proj
+  #     - v_proj
+  #   lora_dropout: 0.01
+  #   bias: none
+  #   task_type: CAUSAL_LM
+
+dataset_config_path:
+  - ./configs/datasets/m3it_coco.yaml  # only coco dataset
+```
+
+学習を開始する場合は、次のコマンドを実行してください。
+
+```bash
+./scripts/run_ds.sh
+```
+
 # 利用方法
 
 Hugging Face Hubから学習済みモデルをダウンロードすることができます: [turing-motors/heron-chat-git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0)<br>
@@ -194,6 +258,26 @@ with torch.no_grad():
 print(processor.tokenizer.batch_decode(out))
 ```
 
+もしZeRO-3で訓練されたモデルならば、推論用コードに次の変更を加えます。
+
+```diff
+- # prepare a pretrained model
+- model = GitLlamaForCausalLM.from_pretrained(
+-     'turing-motors/heron-chat-git-Llama-2-7b-v0', torch_dtype=torch.float16
+- )
++ from heron.models.utils import load_model, load_pretrained_weight
++ import yaml
++ 
++ config_file = f"./projects/opt/exp002_ds.yml"
++ 
++ # get config
++ with open(config_file, "r") as i_:
++     config = yaml.safe_load(i_)
++ 
++ model = load_model(config["model_config"])
++ model.load_state_dict(torch.load('./output/opt/exp002_ds/epoch-1/pytorch_model.bin'), strict=True)
+```
+
 ### 学習済みモデル一覧
 
 |model|LLM module|adapter|size|
@@ -221,3 +305,4 @@ print(processor.tokenizer.batch_decode(out))
 - [GenerativeImage2Text](https://github.com/microsoft/GenerativeImage2Text): モデルの構成方法の着想はGITに基づいています。
 - [Llava](https://github.com/haotian-liu/LLaVA): 本ライブラリはLlavaプロジェクトを参考にしています。
 - [GIT-LLM](https://github.com/Ino-Ichan/GIT-LLM)
+- [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples)
diff --git a/heron/models/utils.py b/heron/models/utils.py
@@ -191,6 +191,9 @@ def set_trainable_params(
                 untrainable_list.append(name)
 
     else:
-        raise ValueError("either keys_to_freeze or keys_to_finetune should be specified")
+        # Full parameter Tuning
+        for name, p in model.named_parameters():
+            p.requires_grad = True
+            trainable_list.append(name)
 
     return trainable_list, untrainable_list