Skip to content

Conversation

fzyzcjy
Copy link
Contributor

@fzyzcjy fzyzcjy commented Jun 24, 2025

WIP


Some quick dump:

  • Can use few SMs to run communication while most SMs to run computation. DeepEP with few SMs in low latency mode will almost still run in full speed.
    • The change is pretty simple to support few-SM, just change num_warp_groups etc
    • I can provide more info and experiment results if needed
  • I am waiting for fp4 kernel before doing more computation communication overlap, that's why this draft is pending and no progress at all
    • but anyway the code here does show overlap and speedup b/t combine send and (the fp8 DeepGEMM) down gemm

@fzyzcjy fzyzcjy changed the title Computation communication overlap Computation communication overlap, use few SMs for low-latency mode, etc Jul 3, 2025
@fzyzcjy fzyzcjy changed the title Computation communication overlap, use few SMs for low-latency mode, etc Computation communication overlap Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant