Optimize get_dispatch_layout from 170us to 45us #232

fzyzcjy · 2025-06-20T01:51:50Z

Quickly tested:

before

[layout] Kernel performance: 0.170 ms
[layout] Kernel performance: 0.178 ms

after

[layout] Kernel performance: 0.045 ms
[layout] Kernel performance: 0.046 ms

core change is simple, just change num SM.

The code diff is huge since I inherited code change from #218 but that's mostly unnecessary.

I will try to find out some time later to cleanup the code (eg still use few SM when need two stream overlap, and only use this one when do not enable two stream overlap etc), and here just quickly PR to know whether you think this looks acceptable and if anyone happens to need this then can copy-paste.

This reverts commit acf108a.

This reverts commit 8cf6bd8.

This reverts commit 0613b1f.

LyricZhao · 2025-07-02T07:51:35Z

I did some similar change according to your PR in d4f3497 and 77ddb01. Thanks! 👍🏻

Different from your change:

__launch_bounds__ should be removed, as for compute-comm overlapping, if computation launches first and only left, e.g. 20 SMs for comm, we better make all the blocks into 20 SMs together but not by several waves.
The lower bound of kNumRanksPerSM should be 8, it is related to the number of nodes.

fzyzcjy added 30 commits June 17, 2025 11:03

more

be2eed8

more

9ecb941

more

9683d94

more

8cf6bd8

more

acf108a

Revert "more"

3e2cede

This reverts commit acf108a.

Revert "more"

45fa1af

This reverts commit 8cf6bd8.

more

443bfa8

more

b986cce

more

3ea6f58

more

5d3513b

more

bda5695

more

3740762

more

ad4aee8

more

b5e4aad

more

240d058

more

5379d59

more

4fc8e79

more

2e90afe

more

3639a57

more

4ef8f05

more

047656e

more

c21f36d

more

7f3e4c0

more

92fb573

more

29f86f3

more

5557e70

more

9fd34e7

more

6417393

more

faaeaad

fzyzcjy added 28 commits June 17, 2025 16:28

more

379ac24

more

43999dc

more

7916011

Merge branch 'feat/cu_mem_api' into feat/deepep_normal_update

2f90c2d

more

0525f8f

Merge branch 'feat/cu_mem_api' into feat/deepep_normal_update

3032ede

more

dc652ea

more

151993b

more

06169d5

more

4b54c98

more

dec3315

more

04f6a5b

more

0613b1f

Revert "more"

b0ba0ea

This reverts commit 0613b1f.

more

01f0f90

more

b80e0d4

more

26130b2

moew

e395621

more

5b7e55a

temp

a8c6df8

more

e895366

more

af060e6

more

378f9b2

more

0fc2a30

more

1b14ad6

hack

2b1e8d0

hack

5a1240e

more

b56b9ca

LyricZhao closed this Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize get_dispatch_layout from 170us to 45us #232

Optimize get_dispatch_layout from 170us to 45us #232

Uh oh!

fzyzcjy commented Jun 20, 2025 •

edited

Loading

Uh oh!

LyricZhao commented Jul 2, 2025

Uh oh!

Uh oh!

Optimize get_dispatch_layout from 170us to 45us #232

Optimize get_dispatch_layout from 170us to 45us #232

Uh oh!

Conversation

fzyzcjy commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LyricZhao commented Jul 2, 2025

Uh oh!

Uh oh!

fzyzcjy commented Jun 20, 2025 •

edited

Loading