training gpu requirements #14

DuanWei1234 · 2024-09-29T12:28:03Z

Hello，thanks for your code, I want to know how much GPU memory is needed for training.

theFoxofSky · 2024-10-15T06:58:05Z

About 30-60G depending on the batch size and resolution

SzhangS · 2024-10-23T01:54:32Z

Hello I follow the default config setting, however the train gpu pass 80G, Could you give me some advice?

SzhangS · 2024-10-23T01:57:46Z

@theFoxofSky

theFoxofSky · 2024-10-23T02:24:19Z

As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.

theFoxofSky · 2024-10-23T02:28:06Z

As we use A100 to train this model, about 80G is good for us. Please open the gradient checkpointing if the GPU memory footprint is too large.

Moreover, I remember the training is not larger than 80G, please check the resolution and batchsize of your data

SzhangS · 2024-10-23T02:29:38Z

image_finetune: False

output_dir: "outputs"
pretrained_model_path: "pretrained_models/RV/"
pretrained_clip_path: "pretrained_models/DINO/dinov2/"
pretrained_mm_path: "pretrained_models/MM/mm_sd_v15_v2.ckpt"

unet_additional_kwargs:
use_motion_module : True
motion_module_resolutions : [ 1,2,4,8 ]
unet_use_cross_frame_attention : False
unet_use_temporal_attention : False

motion_module_type: Vanilla
motion_module_kwargs:
num_attention_heads : 8
num_transformer_block : 1
attention_block_types : [ "Temporal_Self", "Temporal_Self" ]
temporal_position_encoding : True
temporal_position_encoding_max_len : 32
temporal_attention_dim_div : 1
zero_initialize : True

pose_guider_kwargs:
pose_guider_type: "side_guider"
args:
out_channels: [ 320, 320, 640, 1280, 1280 ]

clip_projector_kwargs:
projector_type: "ff"
in_features: 1024
out_features: 768

zero_snr: True
v_pred: True
train_cfg: False
snr_gamma: 5.0
fix_ref_t: True
pose_shuffle_ratio: 0.05

vae_slicing: True
fps: 8 #30

validation_kwargs:
guidance_scale: 2

train_data:

dataset:
dataset_class: VideoDataset
args:
root_dir: "./video_dance_data"
split: "train"
sample_size: [ 768, 576 ]
clip_size: [ 320, 240 ]
image_finetune: False
ref_mode: "random"
sample_n_frames: 12 # 16

validation_data:
dataset_class: VideoDataset
args:
root_dir: "./video_dance_data"
split: "val"
sample_size: [ 768, 576 ]
clip_size: [ 320, 240 ]
image_finetune: False
ref_mode: "first"
sample_n_frames: 12
start_pixel: 0
fix_gap: True

trainable_modules:

"motion_modules."

unet_checkpoint_path: "outputs/stage1_hamer/checkpoints/checkpoint-final.ckpt"

unet_checkpoint_path: "pretrained_models/checkpoint/stage_2_hamer_release.ckpt"

lr_scheduler: "constant_with_warmup"
learning_rate: 1e-5
lr_warmup_steps: 5000
train_batch_size: 1
validation_batch_size: 1

max_train_epoch: -1
max_train_steps: 1000
checkpointing_epochs: -1
checkpointing_steps: 500
checkpointing_steps_tuple: [ 2, 500 ]

global_seed: 42
mixed_precision: "fp16"

is_debug: False

SzhangS · 2024-10-23T02:32:54Z

This is my stage2_hamer.yaml, GPU out of memory on A100 . I have to change sample_n_frames from 16 to 12. Is it feasible？

theFoxofSky · 2024-10-23T03:04:05Z

If so, please use gradient checkpointing.

Call this function before training.

SzhangS · 2024-10-23T03:05:25Z

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training gpu requirements #14

training gpu requirements #14

DuanWei1234 commented Sep 29, 2024

theFoxofSky commented Oct 15, 2024

SzhangS commented Oct 23, 2024

SzhangS commented Oct 23, 2024

theFoxofSky commented Oct 23, 2024

theFoxofSky commented Oct 23, 2024

SzhangS commented Oct 23, 2024

SzhangS commented Oct 23, 2024

theFoxofSky commented Oct 23, 2024

SzhangS commented Oct 23, 2024

training gpu requirements #14

training gpu requirements #14

Comments

DuanWei1234 commented Sep 29, 2024

theFoxofSky commented Oct 15, 2024

SzhangS commented Oct 23, 2024

SzhangS commented Oct 23, 2024

theFoxofSky commented Oct 23, 2024

theFoxofSky commented Oct 23, 2024

SzhangS commented Oct 23, 2024

unet_checkpoint_path: "outputs/stage1_hamer/checkpoints/checkpoint-final.ckpt"

SzhangS commented Oct 23, 2024

theFoxofSky commented Oct 23, 2024

SzhangS commented Oct 23, 2024