Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU][Codegen] SDXL value mismatch without spec file on MI300 #19984

Open
jerryyin opened this issue Feb 13, 2025 · 0 comments
Open

[GPU][Codegen] SDXL value mismatch without spec file on MI300 #19984

jerryyin opened this issue Feb 13, 2025 · 0 comments
Labels
bug 🐞 Something isn't working

Comments

@jerryyin
Copy link
Member

jerryyin commented Feb 13, 2025

What happened?

test_unet.py will run into a regression failure when exercising it locally on int8_fp16 mixed precision tests if not careful about it. The problem here is two-fold:

  • compiler team needs to figure out and fix the regression without spec file
  • usability of the script: assume a user of the script just want to reproduce the github action result, make it take the default spec location instead of relying on an environment variable. I can assign you to this subtask. CC @geomin12

Steps to reproduce your issue

To reproduce the failure

  1. Use a MI300 machine, build iree and make sure iree-compile and iree-run-module are accessible from PATH
  2. Run standalone pytest without fully mimicking the CI steps, basically:
cd iree
pytest ./experimental/regression_suite/shark-test-suite-models/sdxl/test_unet.py -k int8_fp16_rocm -rpfE --log-cli-level=info --timeout=600 --durations=0
  1. Observe the failure and corresponding log

input = None, capture_output = True, timeout = None, check = True
popenargs = (['iree-run-module', '--device=hip', '--module=/root/iree/sdxl_punet_int8_fp16_vmfbs/punet_fp16.rocm_gfx942.vmfb', '--...fp16/punet_weights.irpa', '--input=1x4x128x128xf16=@/root/iree/artifacts/sdxl_punet_int8/inference_input.0.bin', ...],) kwargs = {'cwd': PosixPath('/root/iree/sdxl_punet_int8_fp16_vmfbs'), 'stderr': -1, 'stdout': -1}, process = <Popen: returncode: 1 args: ['iree-run-module', '--device=hip', '--module=/r...>
stdout = b'EXEC @main\n[FAILED] result[0]: element at index 0 (0.0032959) does not match the expected (0.0914307); expected tha......][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...][...]]]\n' stderr = b'', retcode = 1

To fix it

Comment out below lines

if os.path.isfile(
f"{iree_test_path_extension}/attention_and_matmul_spec_punet_{sku}.mlir"
):
INT8_PUNET_FLAGS.append(
f"--iree-codegen-transform-dialect-library={iree_test_path_extension}/attention_and_matmul_spec_punet_{sku}.mlir"
)
else:
# TODO: Investigate numerics failure without using the MI300 punet attention spec
INT8_PUNET_FLAGS.append(
f"--iree-codegen-transform-dialect-library={iree_test_path_extension}/attention_and_matmul_spec_punet_mi300.mlir"
)

Replace it with the right spec location, i.e, in my docker container, it is:

INT8_PUNET_FLAGS.append(f"--iree-codegen-transform-dialect-library=/root/iree/build_tools/pkgci/external_test_suite/attention_and_matmul_spec_punet_mi300.mlir")

Then run pytest again, observe the unit tests pass with matching values.

What component(s) does this issue relate to?

No response

Version information

latest: 5767be3

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant