Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training gpu requirements #14

Open
DuanWei1234 opened this issue Sep 29, 2024 · 9 comments
Open

training gpu requirements #14

DuanWei1234 opened this issue Sep 29, 2024 · 9 comments

Comments

@DuanWei1234
Copy link

Hello,thanks for your code, I want to know how much GPU memory is needed for training.

@theFoxofSky
Copy link
Collaborator

About 30-60G depending on the batch size and resolution

@SzhangS
Copy link

SzhangS commented Oct 23, 2024

Hello I follow the default config setting, however the train gpu pass 80G, Could you give me some advice?

@SzhangS
Copy link

SzhangS commented Oct 23, 2024

@theFoxofSky

@theFoxofSky
Copy link
Collaborator

As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.

@theFoxofSky
Copy link
Collaborator

As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.

Moreover, I remember the training is not larger than 80G, please check the resolution and batchsize of your data

@SzhangS
Copy link

SzhangS commented Oct 23, 2024

image_finetune: False

output_dir: "outputs"
pretrained_model_path: "pretrained_models/RV/"
pretrained_clip_path: "pretrained_models/DINO/dinov2/"
pretrained_mm_path: "pretrained_models/MM/mm_sd_v15_v2.ckpt"

unet_additional_kwargs:
use_motion_module : True
motion_module_resolutions : [ 1,2,4,8 ]
unet_use_cross_frame_attention : False
unet_use_temporal_attention : False

motion_module_type: Vanilla
motion_module_kwargs:
num_attention_heads : 8
num_transformer_block : 1
attention_block_types : [ "Temporal_Self", "Temporal_Self" ]
temporal_position_encoding : True
temporal_position_encoding_max_len : 32
temporal_attention_dim_div : 1
zero_initialize : True

pose_guider_kwargs:
pose_guider_type: "side_guider"
args:
out_channels: [ 320, 320, 640, 1280, 1280 ]

clip_projector_kwargs:
projector_type: "ff"
in_features: 1024
out_features: 768

zero_snr: True
v_pred: True
train_cfg: False
snr_gamma: 5.0
fix_ref_t: True
pose_shuffle_ratio: 0.05

vae_slicing: True
fps: 8 #30

validation_kwargs:
guidance_scale: 2

train_data:

  • dataset:
    dataset_class: VideoDataset
    args:
    root_dir: "./video_dance_data"
    split: "train"
    sample_size: [ 768, 576 ]
    clip_size: [ 320, 240 ]
    image_finetune: False
    ref_mode: "random"
    sample_n_frames: 12 # 16

validation_data:
dataset_class: VideoDataset
args:
root_dir: "./video_dance_data"
split: "val"
sample_size: [ 768, 576 ]
clip_size: [ 320, 240 ]
image_finetune: False
ref_mode: "first"
sample_n_frames: 12
start_pixel: 0
fix_gap: True

trainable_modules:

  • "motion_modules."

unet_checkpoint_path: "outputs/stage1_hamer/checkpoints/checkpoint-final.ckpt"

unet_checkpoint_path: "pretrained_models/checkpoint/stage_2_hamer_release.ckpt"

lr_scheduler: "constant_with_warmup"
learning_rate: 1e-5
lr_warmup_steps: 5000
train_batch_size: 1
validation_batch_size: 1

max_train_epoch: -1
max_train_steps: 1000
checkpointing_epochs: -1
checkpointing_steps: 500
checkpointing_steps_tuple: [ 2, 500 ]

global_seed: 42
mixed_precision: "fp16"

is_debug: False

@SzhangS
Copy link

SzhangS commented Oct 23, 2024

This is my stage2_hamer.yaml, GPU out of memory on A100 . I have to change sample_n_frames from 16 to 12. Is it feasible?

@theFoxofSky
Copy link
Collaborator

If so, please use gradient checkpointing.

Call this function before training.

image

@SzhangS
Copy link

SzhangS commented Oct 23, 2024

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants