Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the Startup Configuration Process for the Hunyuan DiT Model #200

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

derolol
Copy link

@derolol derolol commented Sep 12, 2024

为待解决问题5:优化混元DiT模型启动配置流程(中级难度),提供解决方案

配置目录结构

参考MMEngine/config的配置风格和代码优化了混元DiT模型的启动配置流程,将配置参数按照数据、模型和启动流程划分,使用py文件配置模型参数;在需要新增配置文件时,可引用默认配置

新增hydit/configs目录用于存储启动配置文件,目录结构如下:

- configs
    - base # 默认配置
        - dataset   # 数据集配置
        - model     # 模型配置
        - schedule  # 启动配置
    - train # 基于默认配置文件的训练配置文件

启动流程

在加载配置时,为了保留原有的代码结构,新增配置加载文件hydit/config_engine.py,在train_deepspeed.py中仅修改了函数get_args的引用模块

# 修改前 from hydit.config import get_args
from hydit.config_engine import get_args

由于全参数训练和仅训练Lora都使用的deepspeed,所以新增train_deepspeed.sh脚本启动训练,启动命令如下:

PYTHONPATH=./ sh hydit/train_deepspeed.sh --config hydit/configs/train/train_lora_dit_g2_1024p_single.py 

其中,config参数传递的为训练配置文件相对路径

@tencent-adm
Copy link
Member

tencent-adm commented Sep 12, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants