Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend IdModel to map DIDs for certain patterns. #3987

Open
wujingyue opened this issue Feb 27, 2025 · 7 comments
Open

Extend IdModel to map DIDs for certain patterns. #3987

wujingyue opened this issue Feb 27, 2025 · 7 comments
Assignees

Comments

@wujingyue
Copy link
Collaborator

The following patterns came out from DID loop split (#2563).

Case 1: split reshape (before SdpwFwd)

in: logical=[h], loop=[d, h/d]  # d is an outer split of h
out: root=[h], logical=[a, h/a], loop=[d, a/d, h/a]

We may want to do inner split by d at some point. See case 3. I don't think whether inner or outer will make a whole lot of difference for IdModel.

Case 2: merge reshape (after SdpaFwd)

in: logical=[a, h/a], loop=[d, a/d, h/a]
out: root=[a, h/a], logical=[h], loop=[d, h/a]

Case 3: slice (used after the QKV linear in GPT)

There are several ways to represent that slice as mentioned in http://nv/ezS. One of the ways that I think is promising is:

in: logical=[b, s, hq+hk+hv], loop=[b, s, (hq+hk+hv)/d, d]  # d is an inner split of hq+hk+hv
Q: logical=[b, s, hq], loop=[b, s, hq/d, d]
K: logical=[b, s, hk], loop=[b, s, hq/d, d]
V: logical=[b, s, hv], loop=[b, s, hq/d, d]

Case 4: cat (backprop of the above slice)

dQ: logical=[b, s, hq], loop=[b, s, hq/d, d]
dK: logical=[b, s, hk], loop=[b, s, hq/d, d]
dV: logical=[b, s, hv], loop=[b, s, hq/d, d]
out: logical=[b, s, hq+hk+hv], loop=[b, s, (hq+hk+hv)/d, d]

In all above cases, d, a, h* are static.

cc @naoyam who requested me to write this down

@wujingyue
Copy link
Collaborator Author

#3482 has a proof-of-concept implementation outside IdModel.

I think it's useful to put this support in a common utility so I can use it for alias analysis as well #3902.

It doesn't have to be implemented inside IdModel. I just can't think of a better place.

@naoyam
Copy link
Collaborator

naoyam commented Feb 28, 2025

The following patterns came out from DID loop split (#2563).

Case 1: split reshape (before SdpwFwd)

in: logical=[h], loop=[d, h/d]  # d is an outer split of h
out: root=[h], logical=[a, h/a], loop=[d, a/d, h/a]

We may want to do inner split by d at some point. See case 3. I don't think whether inner or outer will make a whole lot of difference for IdModel.

In this case, IIUC, what we want is to let IdModel recognize, for a given iter domain of i0, the following IDs should be considered equivalent:

split i0 by d -> i0/d, d
split i0/d by a -> i0/d/a, a

and

split i0 by a -> i0/a, a
split i0/a by d -> i0/a/d, d

then:

i0/a/d == i0/d/a

Am I right that i0 must be divisible by a, d and a*d?

@wujingyue
Copy link
Collaborator Author

Am I right that i0 must be divisible by a, d and a*d?

I don't think so. We only need a % d == 0 and h % a == 0 for case 1 and case 2.

@naoyam
Copy link
Collaborator

naoyam commented Feb 28, 2025

If a % d == 0 and h % a == 0, then h % d == 0, so are you saying h doesn't need to be divisible by a*d? If so, h/a/d and h/d/a may not be the same, right?

@zasdfgbnm
Copy link
Collaborator

zasdfgbnm commented Feb 28, 2025

According to https://github.com/NVIDIA/Fuser/blob/main/doc/math/integer-division.md Theorem 2.11:

i0/a/d == i0/d/a 

is unconditionally true if both a and d are >0, because they both equals i0 / (a * d), there is no divisibility requirement.

@zasdfgbnm
Copy link
Collaborator

zasdfgbnm commented Feb 28, 2025

BTW, I really hope we formally write down a mathematical proof in this section:
https://github.com/NVIDIA/Fuser/blob/main/doc/reading/iterdomain.md#2-properties-of-iterdomain-transformations
for equivalences of IterDomains that we want to map. I think the only way to clearly understand the requirements of divisibility is to mathematically prove it.

@zasdfgbnm
Copy link
Collaborator

zasdfgbnm commented Feb 28, 2025

One important mathematical difference of outer split vs inner split I would like to highlight is: ceilDiv(a, ceilDiv(a, b)) is not always equal to b. For example, ceilDiv(5, ceilDiv(5, 100)) is 5, not 100, or in the language of nvFuser, split(5, 100, inner=false) -> (100, 1) is different from split(5, 1, inner=true) -> (5, 1), although the inner has extent 1 for both cases. It is unclear to me what it eventually means for our analysis, but this do make me feel that we should be careful and formally write down a proof.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants