Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AQT] Failed to move compiled module with AQT to a different device #1309

Open
gau-nernst opened this issue Nov 19, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@gau-nernst
Copy link
Collaborator

gau-nernst commented Nov 19, 2024

To reproduce

from torchao import quantize_
from torchao.quantization import int8_weight_only
from torch import nn
import torch

linear = nn.Linear(1024, 1024)
quantize_(linear, int8_weight_only())
linear.cuda()
linear.compile()
linear(torch.randn(1, 1024, device="cuda"))
linear.cpu()  # this will error
linear.cuda()  # this will also error

Error

Traceback (most recent call last):
  File "/home/xxx/python3.10/site-packages/torch/nn/modules/module.py", line 945, in _apply
    torch.utils.swap_tensors(param, param_applied)
  File "/home/xxx/python3.10/site-packages/torch/utils/__init__.py", line 51, in swap_tensors
    raise RuntimeError("Cannot swap t1 because it has weakref associated with it")
RuntimeError: Cannot swap t1 because it has weakref associated with it

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/xxx/debug.py", line 11, in <module>
    linear.cpu()
  File "/home/xxx/python3.10/site-packages/torch/nn/modules/module.py", line 1118, in cpu
    return self._apply(lambda t: t.cpu())
  File "/home/xxx/python3.10/site-packages/torch/nn/modules/module.py", line 949, in _apply
    raise RuntimeError(
RuntimeError: _apply(): Couldn't swap Linear.weight

This seems like a problem for tensor subclass + compile in general, not limited to AQT. Even doing compile(disable=False) still has this error.

cc: @jerryzh168

torchao: 0.7.0+git26648c2c (install from source)
pytorch: tested with 2.5.0 and 2.6.0.dev20241102+cu124

@gau-nernst gau-nernst added the bug Something isn't working label Nov 19, 2024
@gau-nernst
Copy link
Collaborator Author

@jerryzh168 May I know if anyone is looking at this issue? It seems to affect tensor subclass + compile in general, so maybe I can open an issue in core instead?

@jerryzh168
Copy link
Contributor

@gau-nernst not right now I think, yeah I feel it makes sense to open in core.

I did remember swap_tensor can't work for a normal tensor and a tensor subclass tensor, but not sure if both are tensor subclass tensor

yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024
* Warn about nesting and dangling skip in updown.py

The updown processor should warn when skip begin/end are garbled

* Update updown.py

adjust nesting level on skip end

* Update run-docs 

replacing llama3 with stories15 meant we got stories15.1.
Fixing for `run-docs readme`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants