Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381

alexheretic · 2025-12-17T20:47:02Z

Remove cudnn.enabled = False for AMD cards so MIOpen is enabled again.

Default env vars if not specified (so these are easy to override by users if they care):

MIOPEN_FIND_MODE=FAST solves initial slowdown issues particularly for VAE (miopen searching also seems to have little actual perf benefit if you let it run, at least in my experience on rdna3 for sdxl & wan) so this seems a better default.
PYTORCH_MIOPEN_SUGGEST_NHWC=0 This resolves the significant regression in ImageUpscaleWithModel perf with miopen enabled on > rocm 7.

In particular this improves ImageUpscaleWithModel perf on rocm7.1: 7.9s -> 2.4s
(using a simple single image example workflow).

Tested on my 7900 GRE (rdna3) on Linux with rocm 7.1 & 6.4.

Resolves #10447
Relates to #10302, #10448, pytorch/pytorch#170764, ROCm/TheRock#2485

Default MIOPEN_FIND_MODE=FAST Default PYTORCH_MIOPEN_SUGGEST_NHWC=0

alexheretic · 2025-12-17T20:50:12Z

cc @comfyanonymous can you re-check if this works as well as disabling cudnn for your test scenarios? The additional PYTORCH_MIOPEN_SUGGEST_NHWC=0 switch resolves perf issues with rocm7.1 upscaling for me.

lostdisc · 2025-12-20T04:30:13Z

FWIW, I tested these changes for side-effects on SDXL on my Ryzen AI APU. Results:

With cudnn enabled, MIOPEN_FIND_MODE=FAST does avoid extreme VAE slowness for the first run of a given resolution. I had recently tried setting this in conda but surprisingly got no effect, whereas this code is effective. It also seems per-session, since reverting the code makes the issue return.
However, VAE decode still takes me an extra ~1.5 minutes of high GPU usage every time (for 1280x1600 dimensions), roughly doubling my gen time compared to having cudnn disabled. Avoiding this side effect out-of-the-box would be preferable. Adding code to selectively disable cudnn for VAE on AMD like sfinktah's wrapper may work if combined with MIOPEN_FIND_MODE=FAST. Without the latter set, the wrapper still has some first-run slowness.
At the start of KSampler, I quickly get 15 lines of warnings like this:
MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 14745600, provided ptr: 0000000000000000 size: 0
Where the "workspace required" number varies. The run is able to keep going and finish though. (Note that if cudnn is enabled and MIOPEN_FIND_MODE is not set to FAST, there is a first-run-for-this-resolution delay of a few minutes at the start of KSampler, but no warning messsages.)

I look forward to the day MIOpen itself doesn't cripple VAE, haha.

alexheretic · 2025-12-20T08:48:31Z

I'm using tiled vae most places. But iirc i tested with miopen on/off and tiling was still often beneficial either way. For me disabling miopen has a significant negative effect on upscaling perf, but otherwise is fairly similar.

With the work upstream to improve miopen i think it makes sense to have it default enabled. We could tune the defaults more per arch since my testing is for rdna3. I could make this pr just for rdna3 if that helps.

lostdisc · 2025-12-20T10:20:17Z

Yeah, tiled VAE suffers much less from MIOpen somehow. I like getting away with untiled since it shaves ~10s off versus tiled when MIOpen is disabled, but I admit always tiling would have stability benefits in exchange for the tiny speed hit.

alexheretic · 2025-12-24T09:58:25Z

Upstream have disabled nhwc pytorch/pytorch#170780

I'll test again later to see if this fixes the need to specify it here.

lostdisc · 2025-12-24T20:25:01Z

ComfyUI 0.6 added an env var for enabling MIOpen for testing. And I see AMD is inviting logs to help fix the underlying issue, thank goodness.

alexheretic · 2026-01-09T20:03:47Z

Upstream pytorch have addressed the nhwc issue pytorch/pytorch#170764 so I guess we can close this awaiting future pytorch. I still think it would be nice to have default settings provide a better experience though and hence it would be nice to merge some of these amd improvements once in a while.

I also still think MIOPEN_FIND_MODE=FAST is the correct default based on my experience with miopen.

alexheretic requested review from Kosinkadink, comfyanonymous and guill as code owners December 17, 2025 20:47

Re-enable MIOpen for amd cards

67ca7fb

Default MIOPEN_FIND_MODE=FAST Default PYTORCH_MIOPEN_SUGGEST_NHWC=0

alexheretic force-pushed the amd-rdna3-miopen-on branch from 5f8e54a to 67ca7fb Compare December 17, 2025 20:48

alexheretic closed this Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381

Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381

alexheretic commented Dec 17, 2025 •

edited

Loading

Uh oh!

alexheretic commented Dec 17, 2025

Uh oh!

lostdisc commented Dec 20, 2025

Uh oh!

alexheretic commented Dec 20, 2025

Uh oh!

lostdisc commented Dec 20, 2025

Uh oh!

alexheretic commented Dec 24, 2025

Uh oh!

lostdisc commented Dec 24, 2025

Uh oh!

alexheretic commented Jan 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381

Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381

Conversation

alexheretic commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexheretic commented Dec 17, 2025

Uh oh!

lostdisc commented Dec 20, 2025

Uh oh!

alexheretic commented Dec 20, 2025

Uh oh!

lostdisc commented Dec 20, 2025

Uh oh!

alexheretic commented Dec 24, 2025

Uh oh!

lostdisc commented Dec 24, 2025

Uh oh!

alexheretic commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexheretic commented Dec 17, 2025 •

edited

Loading

alexheretic commented Jan 9, 2026 •

edited

Loading