-
Notifications
You must be signed in to change notification settings - Fork 341
[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3061
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit 39d2971 with merge base 5e90c47 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
||
# per channel int8 weight only quantizated mm | ||
w_vals_int8_t = weight_tensor.tensor_impl.int_data.t() | ||
w_vals_int8 = weight_tensor.tensor_impl.int_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a code path for int8 cuda as well I think, changing it has a risk of perf regressions
also this is the older stack, I'd suggest to migrate first, WIP here for int8 + plain layout: #3038
Summary
This pr add aten._weight_int8pack_mm pass to replace mm + mul in woq-int8 model.
Motivation
Improve performance for woq-int8 inference.
Result:
We can get correct result on Intel GPU.