[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061

xiaowangintel · 2025-09-25T08:02:15Z

Summary
This pr add aten._weight_int8pack_mm pass to replace mm + mul in woq-int8 model.

Motivation
Improve performance for woq-int8 inference.

Result:
We can get correct result on Intel GPU.

pytorch-bot · 2025-09-25T08:02:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3061

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit 39d2971 with merge base 5e90c47 ():

NEW FAILURES - The following jobs have failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio fbgemm-gpu-genai --index-url https... / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int8woqtensors_5_cuda
Run 1xL4 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int8woqtensors_5_cuda
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_wo
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_wo
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_wo
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/quantization/test_quant_api.py::TestQuantFlow::test_quantized_tensor_subclass_int8_wo

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-09-25T21:33:11Z

torchao/dtypes/uintx/plain_layout.py


    # per channel int8 weight only quantizated mm
-    w_vals_int8_t = weight_tensor.tensor_impl.int_data.t()
+    w_vals_int8 = weight_tensor.tensor_impl.int_data


this is a code path for int8 cuda as well I think, changing it has a risk of perf regressions

also this is the older stack, I'd suggest to migrate first, WIP here for int8 + plain layout: #3038

Adds _weight_int8pack_mm pass for woq-int8

f60e57a

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025

xiaowangintel requested review from jerryzh168 and liangan1 September 25, 2025 08:02

Adds _weight_int8pack_mm pass for woq-int8

b5572b8

xiaowangintel added the topic: performance Use this tag if this PR improves the performance of a feature label Sep 25, 2025

jerryzh168 reviewed Sep 25, 2025

View reviewed changes

Update plain_layout.py

39d2971

xiaowangintel changed the title ~~Adds _weight_int8pack_mm pass for woq-int8~~ [WIP]Adds _weight_int8pack_mm pass for woq-int8 Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061

[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061

Uh oh!

xiaowangintel commented Sep 25, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

jerryzh168 Sep 25, 2025

Uh oh!

Uh oh!

[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061

Are you sure you want to change the base?

[WIP]Adds _weight_int8pack_mm pass for woq-int8 #3061

Uh oh!

Conversation

xiaowangintel commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3061

❌ 6 New Failures

Uh oh!

jerryzh168 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xiaowangintel commented Sep 25, 2025 •

edited

Loading

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading