Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When height and width change, the inference speed will significantly slow down. #423

Open
serend1p1ty opened this issue Jan 6, 2025 · 9 comments

Comments

@serend1p1ty
Copy link

Whenever height / width change, pipe.prepare_run needs to be re-executed, which is very time-consuming. Is there a better approach?

@feifeibear
Copy link
Collaborator

Which script are you using?

@serend1p1ty
Copy link
Author

serend1p1ty commented Jan 6, 2025

It seems that xdit requires the resolution used in prepare_run to be the same as the resolution in the actual call. If prepare_run uses 1152x1152, but the actual call uses 1056x1056, it will be very slow.

@serend1p1ty
Copy link
Author

serend1p1ty commented Jan 6, 2025

@feifeibear
example/run.sh

...
TASK_ARGS="--height 1152 --width 1152 --no_use_resolution_binning"

N_GPUS=6
PARALLEL_ARGS="--ulysses_degree 6"

COMPILE_FLAG="--use_torch_compile"
...

flux_example.py

  output = pipe(
      height=1152,
      width=1152,
      prompt=input_config.prompt,
      num_inference_steps=input_config.num_inference_steps,
      output_type=input_config.output_type,
      max_sequence_length=256,
      guidance_scale=0.0,
      generator=torch.Generator(device="cuda").manual_seed(input_config.seed),
  )

It taken 5s. If we modify flux_example.py as following:

  output = pipe(
      height=1056,
      width=1056,
      prompt=input_config.prompt,
      num_inference_steps=input_config.num_inference_steps,
      output_type=input_config.output_type,
      max_sequence_length=256,
      guidance_scale=0.0,
      generator=torch.Generator(device="cuda").manual_seed(input_config.seed),
  )

It taken 20s.

@feifeibear
Copy link
Collaborator

I see, you did not prepare run the correct image used for inference. You can run multiple inference and see if it still slow after the first run.

@serend1p1ty
Copy link
Author

@feifeibear Thanks for your response.
Multiple inference can solve this problem. But we facing a new problem, if users submit tasks at different resolutions every time, how should we handle it

@feifeibear
Copy link
Collaborator

Have you used the torch compile option?

@serend1p1ty
Copy link
Author

Have you used the torch compile option?

yes

@serend1p1ty
Copy link
Author

@feifeibear How to deal with the problem of frequent resolution changes? Not use torch compile option?

@feifeibear
Copy link
Collaborator

dynamic shape of torch.compile is a well-known challenge. We will investigate the problem and see if we can find some good solutions. Tell us if you find some good ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants