Skip to content

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Sep 28, 2025

Summary
This PR adds static quant support for SmoothQuant by adding a new Int8StaticActivationInt8WeightConfig configuration. Static quantization will generally have better latency & throughput than dynamic quant as it saves the overhead of runtime qparam selection.
In the implementation:

  • Activation is only per-tensor quantized to support dynamic shape of activation.
  • The SmoothQuantObserver returns act scale along with the smoothing factor.
  • The act scale of each layer is set to Int8StaticActivationInt8WeightConfig for transformation of each linear layer.
    • Note: The Int8StaticActivationInt8WeightConfig is not suitable for general static quantization (although it works), users should use PT2E in that case. It's because the act scale for the config are global instead of per-linear-layer, which is the same as Float8StaticActivationFloat8WeightConfig

Test plan
This PR also updates the test cases for SmoothQuant:

  • Support CPU-only environment
  • Bug fix: the linear module in the UT is not transformed to SmoothQuantLinear since the linear module itself is the parent module.
  • Add outliers to example inputs to simulate the case which SmoothQuant is intended to handle.
pytest -sv test/prototype/test_smoothquant.py

Copy link

pytorch-bot bot commented Sep 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3089

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit abfce41 with merge base 4013764 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 28, 2025
@Xia-Weiwen Xia-Weiwen added the topic: new feature Use this tag if this PR adds a new feature label Sep 28, 2025
@jerryzh168
Copy link
Contributor

@Xia-Weiwen I think it's better to wait until the Int8Tensor migration is done

@Xia-Weiwen
Copy link
Collaborator Author

@Xia-Weiwen I think it's better to wait until the Int8Tensor migration is done

Thanks for the info

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants