ValueError: Tensors must be contiguous while using cfg=True and batch_size > 1 #450

MigueXl · 2025-02-17T09:20:02Z

It is not possible to use cfg for batches greater than 1 because the tensors are not contiguous anymore.
It is possible to solve it by using input_ = input_.contiguous() inside the all_gather function which is part of the group_coordinator file.
I think that this should be neccessary because the remaining parallelization modes allows us to use batches greater than 1 and cfg which is the fastets should be able to do it as well.

feifeibear · 2025-02-25T02:04:41Z

You're welcome! Do you mean that before a certain allgather call, contiguous() needs to be used?

MigueXl · 2025-02-25T09:13:53Z

Exactly. For example, while using "Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers" model in a 2xGPU compute system and cfg=True as parallelization mode, this error is raised when the input batch size is greater than 1.
Thus, what I find is that after including input_.contiguos() this error was not raised anymore. Then, this could be included in the original code in order to avoid this problem.
Thanks you for your response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Tensors must be contiguous while using cfg=True and batch_size > 1 #450

ValueError: Tensors must be contiguous while using cfg=True and batch_size > 1 #450

MigueXl commented Feb 17, 2025

feifeibear commented Feb 25, 2025

MigueXl commented Feb 25, 2025

ValueError: Tensors must be contiguous while using cfg=True and batch_size > 1 #450

ValueError: Tensors must be contiguous while using cfg=True and batch_size > 1 #450

Comments

MigueXl commented Feb 17, 2025

feifeibear commented Feb 25, 2025

MigueXl commented Feb 25, 2025