Skip to content

Latest commit

 

History

History
144 lines (127 loc) · 4.27 KB

TRAIN.md

File metadata and controls

144 lines (127 loc) · 4.27 KB

🚀ShowUI Training Instruction

🔧Install Environment

conda create -n showui python=3.10
conda activate showui
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 --user
pip install -r requirements.txt --user

📦Setup Datasets

Download grounding training dataset -- ShowUI-desktop-8K. Download grounding evaluation dataset -- ScreenSpot

You can use huggingface-cli to download these datasets easily.

cd $_DATA_DIR
huggingface-cli download showlab/ShowUI-desktop-8K --repo-type dataset --local-dir .
huggingface-cli download KevinQHLin/ScreenSpot --repo-type dataset --local-dir .

Then, the dataset should be organized as following:

$_DATA_DIR
    - ScreenSpot
        - images
        - metadata
    - ShowUI-desktop
        - images
        - metadata

⚙️Define Dataloader

You can simply re-use existed implementation of dset_shared_grounding.py for UI grounding; or dset_shared_navigation.py for UI navigation;

For grounding, you just need to define the dataset_mapping for path identification such as "showui": "hf_train.json"

Please organize the UI grounding metadata as following:

"""
sample = {
        "img_url": "c12b572ebccfae5052fe62826615c58d.png",
        "img_size": [
            1920,
            1080
        ],
        "element": [
            {
                "instruction": "Galerie",
                "bbox": [
                    0.6125,
                    0.35648148148148145,
                    0.6817708333333333,
                    0.375
                ],
                "data_type": "text",
                "point": [
                    0.65,
                    0.37
                ]
            },
            {
                "instruction": "Coiffure",
                "bbox": [
                    0.30416666666666664,
                    0.35648148148148145,
                    0.3770833333333333,
                    0.375
                ],
                "data_type": "text",
                "point": [
                    0.34,
                    0.37
                ]
            }],
        "element_size": 2
}
"""

For navigation, you need to define the dataset_mapping as above; Beside, you need to define the action space in template/shared_navigation.py for your customized scenario.

〽️Start Grounding Training

Below are instruction for training on grounding then evaluation on screenspot grounding;

Please keep the bsz as 1, if you want to enlarge the bsz, just increase the grad_accumulation_steps.

deepspeed --include localhost:1 --master_port 1234 train.py \
  --model_id='Qwen/Qwen2-VL-2B-Instruct' \
  --version='Qwen/Qwen2-VL-2B-Instruct' \
  --dataset_dir='$_DATA_DIR' \
  --log_base_dir='$_SAVE_DIR' \
  --epochs=50 \
  --steps_per_epoch=100 \
  --batch_size=1 \
  --grad_accumulation_steps=2 \
  --model_max_length=4096 \
  --exp_id="showui_desktop" \
  --sample_rates="1"  \
  --dataset="showui"  \
  --val_dataset="screenspot"  \
  --precision="bf16" \
  --attn_imple="sdpa" \
  --workers=4 \
  --lora_r=32 \
  --lora_alpha=64  \
  --min_visual_tokens=256  \
  --max_visual_tokens=1344  \
  --num_history=4 \
  --num_turn=100 \
  --crop_min=0.5 \
  --crop_max=1.5 \
  --random_sample \
  --record_sample \
  --lr=0.0001 \
  --uniform_prompt  \
  --ds_zero="zero2" \
  --gradient_checkpointing

Then, the model checkpoints will be saved under $_SAVE_DIR/$exp_id

We have provided evaluation script for screenspot in main/eval_screenspot.py. If you want to evaluate on your own setting, you need to define the evaluation function and place it under main/eval_X.py

You should able monitor the training information in wandb panel.

〽️Start Navigation Training

TBD

⬇️Save Model Checkpoints

Once you finished the training, you can use the following cmd to save the model checkpoint.

exp_dir="$_SAVE_DIR/$exp_id/2024-11-28_17-30-32/"

ckpt_dir="${exp_dir}/ckpt_model/"
cd "$ckpt_dir" || { echo "Failed to cd to $ckpt_dir"; exit 1; }
python zero_to_fp32.py . pytorch_model.bin
mkdir -p merged_model
CUDA_VISIBLE_DEVICES="0" python merge_weight.py --weight="$ckpt_dir/pytorch_model.bin" --lora_r=32 --lora_alpha=64