-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pass to convert Uint8 to int8 across operators #2826
Conversation
We parse in this instruction as a combination of ops. The issue here is we require int8 for MLIR as uint8 is unsupported to create kernels. In this case we convert from unsigned to signed here and use uint16 as the accumulator to avoid overflow before converting to int8.
Need to adjust parse and verify tests for the operated if it contains the added int8 conversion on output.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #2826 +/- ##
===========================================
+ Coverage 91.82% 91.86% +0.04%
===========================================
Files 486 480 -6
Lines 18993 18240 -753
===========================================
- Hits 17440 16757 -683
+ Misses 1553 1483 -70 ☔ View full report in Codecov by Sentry. |
Can you explain this in more detail? |
When speaking with MLIR they said they don't support uint8 x int8 quant_dots along with just uint8 in general for their kernels. I was seeing failures when using models optimized by Onnxruntime and using int8. This is related to the drop in performance is we would fail to compile the model in onnxruntime and then default to ROCm EP and or CPU |
But why are we getting mixed types like this in the first place? I am wondering if we should have a pass to fix this up instead of doing this in the frontend parser. |
Because the implimentation of dynamicquantizelinear breaks it up into separate instructions which hardcode uint8 as part of the output type for the zero point which is defined by the onnx standard.
Doing more tests on this seems to break backend onnx tests since there is an expectation that all zero point output will be uint8. So it appears we should be doing a pass then if one of the inputs to a quantize linear is then a uint8_t type to add in the convert |
Moving to draft. Will craft better matcher after tomorrow morning's coffee. @lakhinderwalia I see your point from Monday's meeting. Looks like we can just target and replace q_min/q_max as well as the convert at the end to adjust the data range without requiring the extra shift add and converts. |
….hpp Makes this easier to test Added additional test cases and passes to existing dynamicquantizelinear verify/parse tests
Updated this to be a separate pass. Seeing some odd behavior on the added tests when this gets converted to kernels on the gpu. Need to sort out that last bit but the rest should be good for review. |
Need this to verify validity of whether our conversion is indeed correct to int8
Split out changes for parse_dynamcquantizelinear for this one. Retargeted branch to that PR until that changeset gets in. Will revisit one that's merged |
@TedThemistokleous to review and resolve conflicts |
DQL fix + Pass changes
|
Not required but can still be used as perf improvement now 2903 has been added. Will need to determine speedup after rebase. Fixes were pulled out of here for some other changes but I'm still curious on perf if we can just auto convert uint8/ handle matmul correctly now |
@@ -869,6 +875,12 @@ auto skip_broadcasts_converts(Ms... ms) | |||
return skip(name("broadcast", "multibroadcast", "contiguous", "convert"))(ms...); | |||
} | |||
|
|||
template <class... Ms> | |||
auto skip_broadcast_squeeze(Ms... ms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just put this matcher in the .cpp file. as its specific to the pass.
@@ -56,7 +56,7 @@ struct quantizelinear | |||
|
|||
shape compute_shape(std::vector<shape> inputs) const | |||
{ | |||
check_shapes{inputs, *this}.same_dims().has(2, 3); | |||
check_shapes{inputs, *this}.has(2, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed? They should have the same dimensions.
Check results before merge 🔆 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
@TedThemistokleous Is this PR still needed ? |
~Related to #1904 ~
Originally from when we were parsing in dynamicquantizelinear but we've found other operators weren't handling inputs of uint8 correctly and thus we need to add a pass to properly handle conversion.
This is dependent on the more recent #2888 so that we can then handle MatmulInteger,ConvInteger and DynamicQuantizeLinear and uint8 during compile time rather than in the parser