You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Error traceback
If applicable, paste the error trackback here.
Traceback (most recent call last):
File "hydit/train_deepspeed.py", line 529, in <module>
main(get_args())
File "hydit/train_deepspeed.py", line 461, in main
loss_dict = diffusion.training_losses(model=model, x_start=latents, model_kwargs=model_kwargs)
File "/workspace/HunyuanDiT/hydit/diffusion/respace.py", line 97, in training_losses
return super().training_losses(self._wrap_model(model), *args, **kwargs)
File "/workspace/HunyuanDiT/hydit/diffusion/gaussian_diffusion.py", line 551, in training_losses
out_dict = model(x_t, t, **model_kwargs)
File "/workspace/HunyuanDiT/hydit/diffusion/respace.py", line 144, in __call__
return self.model(x, new_ts, **kwargs)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1568, in forward
loss = self.module(*inputs, **kwargs)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/HunyuanDiT/hydit/modules/models.py", line 341, in forward
text_states_t5 = self.mlp_t5(text_states_t5.view(-1, c_t5)).float()
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda/ls/envs/HunyuanDiT/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
The text was updated successfully, but these errors were encountered:
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
使用V100训练,关闭Flash Attention,使用fp32的类型训练,出现以上问题
Reproduction
Environment
Error traceback
If applicable, paste the error trackback here.
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
The text was updated successfully, but these errors were encountered: