-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] Optimize HunyuanVideo Inference Speed with ParaAttention #10383
Comments
Thanks for the kind words @chengzeyi Great work building this, and it would be really cool to mention ParaAttention (similar to how we have a dedicated doc page xDiT, DeepCache, and others). Apart from it being an extremely fast inference solution, it is a very valuable educational resource due to the simplicity of the implementations (I've personally learnt a lot from the codebase atleast so tysm). |
As @a-r-r-o-w mentioned, feel free to create a dedicated doc page for ParaAttention and you can ping me on it! 🤗 |
@chengzeyi thanks for opening the issue! Would you be willing to open a PR for the doc page? it is best if it's coming from the author. If not, let us know. we can help as well |
Thank you! I am currently writing a doc about how to optimize video model inference. This could be a good start. |
https://github.com/chengzeyi/ParaAttention/blob/main/doc/fastest_hunyuan_video.md I wrote a tutorial about how to optimize HunyuanVideo inference with ParaAttention. |
Hi guys,
First and foremost, I would like to commend you for the incredible work on the
diffusers
library. It has been an invaluable resource for my projects.I am writing to suggest an enhancement to the inference speed of the
HunyuanVideo
model. We have found that using ParaAttention can significantly speed up the inference of HunyuanVideo. ParaAttention provides context parallel attention that works withtorch.compile
, supporting Ulysses Style and Ring Style parallelism. I hope we could add a doc or introduction of how to makeHunyuanVideo
ofdiffusers
run faster withParaAttention
. BesidesHunyuanVideo
,FLUX
,Mochi
andCogVideoX
are also supported.Steps to Optimize HunyuanVideo Inference with
ParaAttention
:Install ParaAttention:
pip3 install para-attn # Or visit https://github.com/chengzeyi/ParaAttention.git to see detailed instructions
Example Script:
Here is an example script to run HunyuanVideo with ParaAttention:
Save the above code to
run_hunyuan_video.py
and run it with torchrun:The generated video on 2xH100:
hunyuan_video.mp4
By following these steps, users can leverage
ParaAttention
to achieve faster inference times withHunyuanVideo
on multiple GPUs.Thank you for considering this suggestion. I believe it could greatly benefit the community and enhance the performance of
HunyuanVideo
. Please let me know if there are any questions or further clarifications needed.The text was updated successfully, but these errors were encountered: