Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Tensors must be contiguous while using cfg=True and batch_size > 1 #450

Open
MigueXl opened this issue Feb 17, 2025 · 2 comments

Comments

@MigueXl
Copy link

MigueXl commented Feb 17, 2025

It is not possible to use cfg for batches greater than 1 because the tensors are not contiguous anymore.
It is possible to solve it by using input_ = input_.contiguous() inside the all_gather function which is part of the group_coordinator file.
I think that this should be neccessary because the remaining parallelization modes allows us to use batches greater than 1 and cfg which is the fastets should be able to do it as well.

Image

@feifeibear
Copy link
Collaborator

You're welcome! Do you mean that before a certain allgather call, contiguous() needs to be used?

@MigueXl
Copy link
Author

MigueXl commented Feb 25, 2025

Exactly. For example, while using "Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers" model in a 2xGPU compute system and cfg=True as parallelization mode, this error is raised when the input batch size is greater than 1.
Thus, what I find is that after including input_.contiguos() this error was not raised anymore. Then, this could be included in the original code in order to avoid this problem.
Thanks you for your response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants