[AQT] Failed to move compiled module with AQT to a different device #1309

gau-nernst · 2024-11-19T03:51:55Z

To reproduce

from torchao import quantize_
from torchao.quantization import int8_weight_only
from torch import nn
import torch

linear = nn.Linear(1024, 1024)
quantize_(linear, int8_weight_only())
linear.cuda()
linear.compile()
linear(torch.randn(1, 1024, device="cuda"))
linear.cpu()  # this will error
linear.cuda()  # this will also error

Error

Traceback (most recent call last):
  File "/home/xxx/python3.10/site-packages/torch/nn/modules/module.py", line 945, in _apply
    torch.utils.swap_tensors(param, param_applied)
  File "/home/xxx/python3.10/site-packages/torch/utils/__init__.py", line 51, in swap_tensors
    raise RuntimeError("Cannot swap t1 because it has weakref associated with it")
RuntimeError: Cannot swap t1 because it has weakref associated with it

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/xxx/debug.py", line 11, in <module>
    linear.cpu()
  File "/home/xxx/python3.10/site-packages/torch/nn/modules/module.py", line 1118, in cpu
    return self._apply(lambda t: t.cpu())
  File "/home/xxx/python3.10/site-packages/torch/nn/modules/module.py", line 949, in _apply
    raise RuntimeError(
RuntimeError: _apply(): Couldn't swap Linear.weight

This seems like a problem for tensor subclass + compile in general, not limited to AQT. Even doing compile(disable=False) still has this error.

cc: @jerryzh168

torchao: 0.7.0+git26648c2c (install from source)
pytorch: tested with 2.5.0 and 2.6.0.dev20241102+cu124

The text was updated successfully, but these errors were encountered:

gau-nernst · 2024-11-26T00:59:57Z

@jerryzh168 May I know if anyone is looking at this issue? It seems to affect tensor subclass + compile in general, so maybe I can open an issue in core instead?

jerryzh168 · 2024-11-26T01:34:26Z

@gau-nernst not right now I think, yeah I feel it makes sense to open in core.

I did remember swap_tensor can't work for a normal tensor and a tensor subclass tensor, but not sure if both are tensor subclass tensor

* Warn about nesting and dangling skip in updown.py The updown processor should warn when skip begin/end are garbled * Update updown.py adjust nesting level on skip end * Update run-docs replacing llama3 with stories15 meant we got stories15.1. Fixing for `run-docs readme`

gau-nernst added the bug Something isn't working label Nov 19, 2024

gau-nernst mentioned this issue Nov 26, 2024

Compiled nn.Module with tensor subclass can't be moved to another device pytorch/pytorch#141548

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AQT] Failed to move compiled module with AQT to a different device #1309

[AQT] Failed to move compiled module with AQT to a different device #1309

gau-nernst commented Nov 19, 2024 •

edited

Loading

gau-nernst commented Nov 26, 2024

jerryzh168 commented Nov 26, 2024

[AQT] Failed to move compiled module with AQT to a different device #1309

[AQT] Failed to move compiled module with AQT to a different device #1309

Comments

gau-nernst commented Nov 19, 2024 • edited Loading

gau-nernst commented Nov 26, 2024

jerryzh168 commented Nov 26, 2024

gau-nernst commented Nov 19, 2024 •

edited

Loading