-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Default MIOPEN_FIND_MODE=FAST Default PYTORCH_MIOPEN_SUGGEST_NHWC=0
5f8e54a to
67ca7fb
Compare
|
cc @comfyanonymous can you re-check if this works as well as disabling cudnn for your test scenarios? The additional |
|
FWIW, I tested these changes for side-effects on SDXL on my Ryzen AI APU. Results:
I look forward to the day MIOpen itself doesn't cripple VAE, haha. |
|
I'm using tiled vae most places. But iirc i tested with miopen on/off and tiling was still often beneficial either way. For me disabling miopen has a significant negative effect on upscaling perf, but otherwise is fairly similar. With the work upstream to improve miopen i think it makes sense to have it default enabled. We could tune the defaults more per arch since my testing is for rdna3. I could make this pr just for rdna3 if that helps. |
|
Yeah, tiled VAE suffers much less from MIOpen somehow. I like getting away with untiled since it shaves ~10s off versus tiled when MIOpen is disabled, but I admit always tiling would have stability benefits in exchange for the tiny speed hit. |
|
Upstream have disabled nhwc pytorch/pytorch#170780 I'll test again later to see if this fixes the need to specify it here. |
|
ComfyUI 0.6 added an env var for enabling MIOpen for testing. And I see AMD is inviting logs to help fix the underlying issue, thank goodness. |
|
Upstream pytorch have addressed the nhwc issue pytorch/pytorch#170764 so I guess we can close this awaiting future pytorch. I still think it would be nice to have default settings provide a better experience though and hence it would be nice to merge some of these amd improvements once in a while. I also still think |
Remove
cudnn.enabled = Falsefor AMD cards so MIOpen is enabled again.Default env vars if not specified (so these are easy to override by users if they care):
MIOPEN_FIND_MODE=FASTsolves initial slowdown issues particularly for VAE (miopen searching also seems to have little actual perf benefit if you let it run, at least in my experience on rdna3 for sdxl & wan) so this seems a better default.PYTORCH_MIOPEN_SUGGEST_NHWC=0This resolves the significant regression in ImageUpscaleWithModel perf with miopen enabled on > rocm 7.In particular this improves
ImageUpscaleWithModelperf on rocm7.1: 7.9s -> 2.4s(using a simple single image example workflow).
Tested on my 7900 GRE (rdna3) on Linux with rocm 7.1 & 6.4.
Resolves #10447
Relates to #10302, #10448, pytorch/pytorch#170764, ROCm/TheRock#2485