Computation communication overlap #249

fzyzcjy · 2025-06-24T00:28:12Z

WIP

Some quick dump:

Can use few SMs to run communication while most SMs to run computation. DeepEP with few SMs in low latency mode will almost still run in full speed.
- The change is pretty simple to support few-SM, just change num_warp_groups etc
- I can provide more info and experiment results if needed
I am waiting for fp4 kernel before doing more computation communication overlap, that's why this draft is pending and no progress at all
- but anyway the code here does show overlap and speedup b/t combine send and (the fp8 DeepGEMM) down gemm

(cherry picked from commit df72cff)

# Conflicts: # tests/test_intranode.py # tests/utils.py

fzyzcjy added 30 commits June 21, 2025 07:42

more

db53053

more

2278722

more

661e188

more

5567637

more

20855ee

more

ad11318

more

a960335

Merge branch 'feat/test_detailed_time' into feat/dev_20250621

9a8d98a

cherry pick

681bdc5

(cherry picked from commit df72cff)

Merge branch 'feat/num_processes' into feat/dev_20250621

7421672

# Conflicts: # tests/test_intranode.py # tests/utils.py

more

56758db

more

0a8848a

more

cd4af65

more

c0aa0dc

more

edf309e

more

6529a71

more

85c4056

more

4955fe7

more

53d4c7c

more

556d111

more

ed42906

more

a7f68e5

more

c5c8c1b

more

fcbde21

more

2070562

more

7a21473

more

9e5f1aa

more

6e3f4d0

more

748dd12

more

f2caa1f

fzyzcjy added 25 commits June 25, 2025 20:53

more

4ef1b9b

more

dff2a88

more

0c45b6c

more

b8f9871

more

e9bf90d

more

a1fea9a

more

abe6700

more

67f8892

more

130fb50

more

4db70a8

more

c7e6cab

more

899f038

more

6c3e569

more

7a904fe

more

6465df0

more

3b37454

more

ee41d63

more

e9d8bb0

extract

bff1a00

more

72ea8a2

more

8464286

more

99fd0b4

more

69470a7

more

a3db15a

more

e6784b2

LyricZhao force-pushed the main branch from 6a7e456 to 7705f53 Compare July 2, 2025 10:37

fzyzcjy changed the title ~~Computation communication overlap~~ Computation communication overlap, use few SMs for low-latency mode, etc Jul 3, 2025

fzyzcjy changed the title ~~Computation communication overlap, use few SMs for low-latency mode, etc~~ Computation communication overlap Jul 3, 2025

fzyzcjy mentioned this pull request Jul 3, 2025

Allow using few SMs for low-latency mode #277

Open

sphish force-pushed the main branch from 8ff19f5 to bdd119f Compare July 22, 2025 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Computation communication overlap #249

Computation communication overlap #249

Uh oh!

fzyzcjy commented Jun 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Computation communication overlap #249

Are you sure you want to change the base?

Computation communication overlap #249

Uh oh!

Conversation

fzyzcjy commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fzyzcjy commented Jun 24, 2025 •

edited

Loading