-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate sdxl example #182
Comments
Improved version of sdxl is at https://github.com/ROCm/AMDMIGraphX/commits/sdxl_perf_torch_buffers/ The main idea was to move the buffers to gpu memory. This requires migraphx supports Note: for The original and rewritten perf logs:
Elapsed time for decode: 440.0491 ms
Elapsed time clip: 37.4158 ms
Elapsed time unet: 8252.2065 ms
Elapsed time vae: 440.0772 ms
Elapsed time for run: 8752.8331 ms
Elapsed time for decode: 434.3943 ms
Elapsed time clip: 24.1256 ms
Elapsed time unet: 7470.2498 ms
Elapsed time vae: 434.4229 ms
Elapsed time for run: 7951.7439 ms There are differences on the output images probably due to precision |
The packages used in TRT demo: As seen, |
To get the clip.opt and clip2.opt models working, we need to use graph surgeon. Update: Actually, that is already in the model. The problem is that it is not "exposed" as an output. We need to re-export it and make sure it is an output. |
The commit that enabled it: ROCm@0d9e4b9 The "hidden_states" was just renamed, but was not added to the onnx outputs. With clip_modifier.py, we are creating a "mod" (modified) version.
There is a change in the outputs as well. Also, now the "third" arm of the np version is fixed. |
Both SD21 and SDXL were updated to use torch. Still debugging why the refiner gives strange results for certain models. |
Extended it with stream and events: ROCm#3051 |
No description provided.
The text was updated successfully, but these errors were encountered: