Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Multi Layer Perceptron (MLP) selection for projector #25

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Onely7
Copy link
Contributor

@Onely7 Onely7 commented Nov 2, 2023

Enable Multi Layer Perceptron (MLP) selection for projector

First of all, thank you for creating such an amazing project!
This repository has become very useful for me.

Changes

Now, I have made a modification to the code to allow the projector to be a Multi Layer Perceptron (MLP) when model_type: git_llm is selected.
Previously, when using model_type: git_llm, a single Linear layer was applied as the projector that connects the Vision model and the LLM. However, inspired by LLaVA v1.5 【Liu+'23 Improved Baselines with Visual Instruction Tuning】, I have added code that makes it possible to vary the number of these Linear layers simply by adding an option (mlp_adapter) in projects/OOO/OO.yml under model_config.
The main details of the code for changing the projector to an MLP can be understood by looking at heron/models/mlp_adapter.py.
Furthermore, this code references the github implementation of LLaVA v1.5 ( https://github.com/haotian-liu/LLaVA/blob/785f766fcddc86ffeaa62cd51cf7834a11c04e6d/llava/model/multimodal_projector/builder.py#L33 ).

Also, to maintain compatibility, I've made sure it works the same way as before with the existing projects/OOO/OO.yml.

For example, if you use projects/llama/exp001.yml as it is,

training_config:
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 4
  num_train_epochs: 1
  dataloader_num_workers: 16
  fp16: true
  optim: "adamw_torch"
  learning_rate: 5.0e-5
  logging_steps: 100
  evaluation_strategy: "steps"
  save_strategy: "steps"
  eval_steps: 4000
  save_steps: 4000
  save_total_limit: 1
  deepspeed: ./configs/deepspeed/ds_config_zero1.json
  output_dir: ./output/
  report_to: "wandb"

model_config:
  fp16: true
  pretrained_path: # None or path to model weight
  model_type: git_llm
  language_model_name: meta-llama/Llama-2-7b-chat-hf
  vision_model_name: openai/clip-vit-base-patch16
  num_image_with_embedding: 1 # if 1, no img_temporal_embedding
  max_length: 512
  keys_to_finetune:
    - visual_projection
    - num_image_with_embedding
  keys_to_freeze: []

  use_lora: true
  lora:
    r: 8
    lora_alpha: 32
    target_modules:
      - q_proj
      - k_proj
      - v_proj
    lora_dropout: 0.01
    bias: none
    task_type: CAUSAL_LM

dataset_config_path:
  - ./configs/datasets/m3it.yaml

As before, a single layer Linear layer will be applied as the projector.

If you want to change the projector to an MLP, add the mlp_adapter item to model_config in projects/llama/exp001.yml and give it the name mlp2x_gelu.

training_config:
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 4
  num_train_epochs: 1
  dataloader_num_workers: 16
  fp16: true
  optim: "adamw_torch"
  learning_rate: 5.0e-5
  logging_steps: 100
  evaluation_strategy: "steps"
  save_strategy: "steps"
  eval_steps: 4000
  save_steps: 4000
  save_total_limit: 1
  deepspeed: ./configs/deepspeed/ds_config_zero1.json
  output_dir: ./output/
  report_to: "wandb"

model_config:
  fp16: true
  pretrained_path: # None or path to model weight
  model_type: git_llm
  mlp_adapter: mlp2x_gelu # projector will be a 2-layer MLP.
  language_model_name: meta-llama/Llama-2-7b-chat-hf
  vision_model_name: openai/clip-vit-base-patch16
  num_image_with_embedding: 1 # if 1, no img_temporal_embedding
  max_length: 512
  keys_to_finetune:
    - visual_projection
    - num_image_with_embedding
  keys_to_freeze: []

  use_lora: true
  lora:
    r: 8
    lora_alpha: 32
    target_modules:
      - q_proj
      - k_proj
      - v_proj
    lora_dropout: 0.01
    bias: none
    task_type: CAUSAL_LM

dataset_config_path:
  - ./configs/datasets/m3it.yaml

In the above example, by adding mlp_adapter: mlp2x_gelu under model_config, the projector will become a 2-layer MLP, but if you want it to be 3 layers, simply changing to mlp_adapter: mlp3x_gelu will make it a 3-layer MLP easily!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant