-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infra Fixes to DLM for Onnxrt Wheel + Int8 #2468
Comments
Currently tracking down issues I've found in 6.0 builds through DLM
Seeing a lot of failures come up with EinSum with MIOpen calls Currently working on 1 and doing a raw build without wheel to determine if there's something on 6.0 that isn't built in correctly. Seems to effect the int8 quant set of changes as well when running test through DLM.
|
Seeing this fun tidbit
|
Looks like the result is the following after letting it fail:
|
This is still open. Other items need to be completed and are still in review |
Still need to sort out issue with int8 failing and what the UTs added into DLM for onnxrt are picking up with rocm. I've rolled back builds and wheel as far as -05 and seeing the same behavior with the failing call and tests. |
GELU tests are failing consistently for fp16 and always failing at the end of the run. Not sure if related to the other issues we're seeing with the int8 quant side as well. 2023-11-25 01:41:12.673824527 [V:onnxruntime:, sequential_executor.cc:534 ExecuteThePlan] Number of streams: 1 |
https://github.com/ROCmSoftwarePlatform/DeepLearningModels/pull/1101 fixes issue with DLM conv_to_onnx as well. |
ROCm/onnxruntime#25 Fixes issues seen with our GELU test failing. This is due to how we invoke fast_math on fp16 but seem to lose accuracy on our Navi based cards to cause such a failure. Defaulting this to false and adding the proper env vars to toggle this as part of our runs. |
Blocked to closing this out until we get RC5 fixes to hip to test on. |
Several pieces being reviewed/added to DLM to fix issue we're seeing in CI and run between QA and Dev
The text was updated successfully, but these errors were encountered: