Update to PyTorch 2.2 #52

awf · 2024-04-15T21:40:20Z

A number of changes needed to work with PT2.2+. Primarily, there's now (since commit) no good way to intercept module calls, so we instead replace all nn.Modules with trivial subclasses, making them "user" modules.

Passes tests on asses tests on 2.1,2.2 (stable),2.4 (nightly) and examples/scale_analysis.py produces identical output (with torch.manual_seed).

This implementation is somewhat faster, as it does less work in the patched forward functions.

One source of changes is that the node naming post 2.2 better reflects the input code.
For example, input code

def forward(self, idxs: Tensor) -> Tuple[Tensor, Tensor]:  # pragma: no cover
    # idxs has 0 args -> shouldn't be pruned
    x = self.emb(idxs)  # emb has 1 float arg (weights) -> depends on tol
    _x = x.flatten(start_dim=0, end_dim=-1)  # 1 float, same scale -> prune
    x = _x.view(x.shape)  # 1 float arg, same scale -> prune
    y = self.linear(x)  # scale changes -> shouldn't be pruned
    scores = F.softmax(y, dim=-1)  # scale changes -> shouldn't be pruned
    top_idx = torch.argmax(scores, dim=-1)  # not float -> shouldn't be pruned
    top_idx = torch.unsqueeze(top_idx, -1)  # not float -> shouldn't be pruned
    top_score_x = torch.gather(x, -1, top_idx)  # small change -> depends on tol
    top_score_x += randn_like(top_score_x)  # 2 floats, same scale -> no prune
    return top_score_x, top_idx

Became, pre 2.2, where the variable names are derived from the operator:

# Example of a pre-2.2 captured graph
def forward(self, L_idxs_ : torch.Tensor):
    l_idxs_ = L_idxs_
    l__self___emb_weight = foo.L__self___emb_weight
    embedding = torch.nn.functional.embedding(l_idxs_, l__self___emb_weight, None, None, 2.0, False, False);  l_idxs_ = l__self___emb_weight = None
    flatten = embedding.flatten(start_dim = 0, end_dim = -1);  embedding = None
    view = flatten.view((8, 32, 64));  flatten = None
    l__self___linear_weight = foo.L__self___linear_weight
    l__self___linear_bias = foo.L__self___linear_bias
    linear = torch._C._nn.linear(view, l__self___linear_weight, l__self___linear_bias);  l__self___linear_weight = l__self___linear_bias = None
    softmax = torch.nn.functional.softmax(linear, dim = -1);  linear = None
    argmax = torch.argmax(softmax, dim = -1);  softmax = None
    unsqueeze = torch.unsqueeze(argmax, -1);  argmax = None
    gather = torch.gather(view, -1, unsqueeze);  view = None
    randn_like = torch.randn_like(gather)
    gather += randn_like;  iadd = gather;  gather = randn_like = None
    return (iadd, unsqueeze)

and are now

# Example of a post-2.2 captured graph
def forward(self, L_idxs_ : torch.Tensor):
    l_idxs_ = L_idxs_
    l__self___emb_weight = foo.L__self___emb_weight
    x = torch.nn.functional.embedding(l_idxs_, l__self___emb_weight, None, None, 2.0, False, False);  l_idxs_ = l__self___emb_weight = None
    _x = x.flatten(start_dim = 0, end_dim = -1);  x = None
    x_1 = _x.view((8, 32, 64));  _x = None
    l__self___linear_weight = foo.L__self___linear_weight
    l__self___linear_bias = foo.L__self___linear_bias
    y = torch._C._nn.linear(x_1, l__self___linear_weight, l__self___linear_bias);  l__self___linear_weight = l__self___linear_bias = None
    scores = torch.nn.functional.softmax(y, dim = -1);  y = None
    top_idx = torch.argmax(scores, dim = -1);  scores = None
    top_idx_1 = torch.unsqueeze(top_idx, -1);  top_idx = None
    top_score_x = torch.gather(x_1, -1, top_idx_1);  x_1 = None
    randn_like = torch.randn_like(top_score_x)
    top_score_x += randn_like;  top_score_x_1 = top_score_x;  top_score_x = randn_like = None
    return (top_score_x_1, top_idx_1)

unit_scaling/tests/transforms/test_track_scales.py

unit_scaling/transforms/utils.py

awf · 2024-04-17T15:02:48Z

unit_scaling/analysis.py

+    p.set_yticks(p.get_yticks())
+    p.set_yticklabels([_rename(item.get_text()) for item in p.get_yticklabels()])
+


This is suppressing a warning about setting yticklabels without setting ticks

unit_scaling/tests/transforms/test_track_scales.py

unit_scaling/docs.py

thecharlieblake

Thanks for looking into this Andrew, pleased with the approach we've converged to (and learned a few things about deepcopy and pickling in the process).

I'll leave merging to you in case there are any other adjustments

awf marked this pull request as draft April 15, 2024 21:40

awf commented Apr 15, 2024

View reviewed changes

unit_scaling/tests/transforms/test_track_scales.py Outdated Show resolved Hide resolved

awf commented Apr 15, 2024

View reviewed changes

unit_scaling/transforms/utils.py Outdated Show resolved Hide resolved

awf changed the title ~~Update to PyTorch 2.3~~ WIP: Update to PyTorch 2.3 Apr 16, 2024

awf changed the title ~~WIP: Update to PyTorch 2.3~~ Update to PyTorch 2.3 Apr 17, 2024

awf marked this pull request as ready for review April 17, 2024 11:19

awf commented Apr 17, 2024

View reviewed changes

awf marked this pull request as draft April 18, 2024 09:42

awf changed the title ~~Update to PyTorch 2.3~~ Update to PyTorch 2.2 Apr 18, 2024

awf added 16 commits April 18, 2024 18:26

Suppress matplotlib warning about set_ticklabels

69866f1

pt2.3 updates, reimplement nnmodule interception

8f0b22d

flake8, mypy, isort

95e16ec

Now passes CI on 2.1, 2.2, 2.3+

97946fb

Remvoe duplicate pt21 declaration

9d23c48

Dcoument and tidy deepcopy_with_intercept

111f1e1

Cleaner deepcopy_with_intercept

87fa3c2

Export pt21

2ba0155

Move decl

6b98bc1

urk

5c073be

Docstring

cb4ea8d

Replace deepcopy hacking with nn Module setattr

74113cb

Comment

33dc163

Trim

832c092

Better naming, nicer code layout

5f34713

Drop 2.1 support

c3ee6ee

awf force-pushed the awf/pt23 branch from c42baab to c3ee6ee Compare April 18, 2024 17:26

awf added 3 commits April 19, 2024 15:02

Workaround PT2.2 dynamo bug (pytorch/pytorch#117563)

2cfb76c

Drop 2.1 support

4681b18

mypy

600fe18

awf marked this pull request as ready for review April 19, 2024 17:17

awf commented Apr 19, 2024

View reviewed changes

unit_scaling/docs.py Show resolved Hide resolved

thecharlieblake approved these changes Apr 20, 2024

View reviewed changes

awf merged commit 2394ee4 into main Apr 22, 2024

awf deleted the awf/pt23 branch April 22, 2024 08:49

awf restored the awf/pt23 branch April 22, 2024 10:38

awf deleted the awf/pt23 branch April 22, 2024 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update to PyTorch 2.2 #52

Update to PyTorch 2.2 #52

Uh oh!

awf commented Apr 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

awf Apr 17, 2024

Uh oh!

Uh oh!

Uh oh!

thecharlieblake left a comment

Uh oh!

Uh oh!

		p.set_yticks(p.get_yticks())
		p.set_yticklabels([_rename(item.get_text()) for item in p.get_yticklabels()])

Update to PyTorch 2.2 #52

Update to PyTorch 2.2 #52

Uh oh!

Conversation

awf commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awf Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thecharlieblake left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awf commented Apr 15, 2024 •

edited

Loading