Extend IdModel to map DIDs for certain patterns. #3987

wujingyue · 2025-02-27T22:43:32Z

The following patterns came out from DID loop split (#2563).

Case 1: split reshape (before SdpwFwd)

in: logical=[h], loop=[d, h/d]  # d is an outer split of h
out: root=[h], logical=[a, h/a], loop=[d, a/d, h/a]

We may want to do inner split by d at some point. See case 3. I don't think whether inner or outer will make a whole lot of difference for IdModel.

Case 2: merge reshape (after SdpaFwd)

in: logical=[a, h/a], loop=[d, a/d, h/a]
out: root=[a, h/a], logical=[h], loop=[d, h/a]

Case 3: slice (used after the QKV linear in GPT)

There are several ways to represent that slice as mentioned in http://nv/ezS. One of the ways that I think is promising is:

in: logical=[b, s, hq+hk+hv], loop=[b, s, (hq+hk+hv)/d, d]  # d is an inner split of hq+hk+hv
Q: logical=[b, s, hq], loop=[b, s, hq/d, d]
K: logical=[b, s, hk], loop=[b, s, hq/d, d]
V: logical=[b, s, hv], loop=[b, s, hq/d, d]

Case 4: cat (backprop of the above slice)

dQ: logical=[b, s, hq], loop=[b, s, hq/d, d]
dK: logical=[b, s, hk], loop=[b, s, hq/d, d]
dV: logical=[b, s, hv], loop=[b, s, hq/d, d]
out: logical=[b, s, hq+hk+hv], loop=[b, s, (hq+hk+hv)/d, d]

In all above cases, d, a, h* are static.

cc @naoyam who requested me to write this down

The text was updated successfully, but these errors were encountered:

wujingyue · 2025-02-27T22:46:52Z

#3482 has a proof-of-concept implementation outside IdModel.

I think it's useful to put this support in a common utility so I can use it for alias analysis as well #3902.

It doesn't have to be implemented inside IdModel. I just can't think of a better place.

naoyam · 2025-02-28T00:58:47Z

The following patterns came out from DID loop split (#2563).

Case 1: split reshape (before SdpwFwd)
in: logical=[h], loop=[d, h/d]  # d is an outer split of h
out: root=[h], logical=[a, h/a], loop=[d, a/d, h/a]
We may want to do inner split by d at some point. See case 3. I don't think whether inner or outer will make a whole lot of difference for IdModel.

In this case, IIUC, what we want is to let IdModel recognize, for a given iter domain of i0, the following IDs should be considered equivalent:

split i0 by d -> i0/d, d
split i0/d by a -> i0/d/a, a

and

split i0 by a -> i0/a, a
split i0/a by d -> i0/a/d, d

then:

i0/a/d == i0/d/a

Am I right that i0 must be divisible by a, d and a*d?

wujingyue · 2025-02-28T01:11:22Z

Am I right that i0 must be divisible by a, d and a*d?

I don't think so. We only need a % d == 0 and h % a == 0 for case 1 and case 2.

naoyam · 2025-02-28T01:18:37Z

If a % d == 0 and h % a == 0, then h % d == 0, so are you saying h doesn't need to be divisible by a*d? If so, h/a/d and h/d/a may not be the same, right?

zasdfgbnm · 2025-02-28T01:25:24Z

According to https://github.com/NVIDIA/Fuser/blob/main/doc/math/integer-division.md Theorem 2.11:

i0/a/d == i0/d/a

is unconditionally true if both a and d are >0, because they both equals i0 / (a * d), there is no divisibility requirement.

zasdfgbnm · 2025-02-28T01:31:24Z

BTW, I really hope we formally write down a mathematical proof in this section:
https://github.com/NVIDIA/Fuser/blob/main/doc/reading/iterdomain.md#2-properties-of-iterdomain-transformations
for equivalences of IterDomains that we want to map. I think the only way to clearly understand the requirements of divisibility is to mathematically prove it.

zasdfgbnm · 2025-02-28T01:38:37Z

One important mathematical difference of outer split vs inner split I would like to highlight is: ceilDiv(a, ceilDiv(a, b)) is not always equal to b. For example, ceilDiv(5, ceilDiv(5, 100)) is 5, not 100, or in the language of nvFuser, split(5, 100, inner=false) -> (100, 1) is different from split(5, 1, inner=true) -> (5, 1), although the inner has extent 1 for both cases. It is unclear to me what it eventually means for our analysis, but this do make me feel that we should be careful and formally write down a proof.

wujingyue assigned naoyam Feb 27, 2025

wujingyue added the Multi-GPU label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend IdModel to map DIDs for certain patterns. #3987

Extend IdModel to map DIDs for certain patterns. #3987

wujingyue commented Feb 27, 2025

wujingyue commented Feb 27, 2025

naoyam commented Feb 28, 2025

Case 1: split reshape (before SdpwFwd)

wujingyue commented Feb 28, 2025

naoyam commented Feb 28, 2025

zasdfgbnm commented Feb 28, 2025 •

edited

Loading

zasdfgbnm commented Feb 28, 2025 •

edited

Loading

zasdfgbnm commented Feb 28, 2025 •

edited

Loading

Extend IdModel to map DIDs for certain patterns. #3987

Extend IdModel to map DIDs for certain patterns. #3987

Comments

wujingyue commented Feb 27, 2025

Case 1: split reshape (before SdpwFwd)

Case 2: merge reshape (after SdpaFwd)

Case 3: slice (used after the QKV linear in GPT)

Case 4: cat (backprop of the above slice)

wujingyue commented Feb 27, 2025

naoyam commented Feb 28, 2025

Case 1: split reshape (before SdpwFwd)

wujingyue commented Feb 28, 2025

naoyam commented Feb 28, 2025

zasdfgbnm commented Feb 28, 2025 • edited Loading

zasdfgbnm commented Feb 28, 2025 • edited Loading

zasdfgbnm commented Feb 28, 2025 • edited Loading

zasdfgbnm commented Feb 28, 2025 •

edited

Loading

zasdfgbnm commented Feb 28, 2025 •

edited

Loading

zasdfgbnm commented Feb 28, 2025 •

edited

Loading