Skip to content

Conversation

JianboDong
Copy link

The current test_low_latency script in DeepEP assumes a uniform token distribution across experts (ranks). In real workloads, however, token-to-expert routing is often skewed. To demonstrate this, we recorded token distributions from several layers of the dpsk v3 model taking the MMLU Pro dataset as input. As the figures show, the per rank token counts in individual layers (red line) vary substantially.
image
image
image

While load balancing mechanisms like EPLB can mitigate skew, they cannot perfectly predict future distributions, so meaningful imbalance typically persists. To evaluate DeepEP under these conditions, we propose to extend the test_low_latency script to include imbalanced loads.

  1. Imbalanced Distribution Modeling --distribution
    We tried several statistical functions to fit the real distribution. As the following figures demonstrate, the log-normal distribution consistently provides the best approximation of the observed token imbalance(lowest Sum of Squared Errors, SSE). In addition, we also include alternative options, such as gamma and power-law distributions.
image image image
  1. Degree of Imbalance: --imbalance-factors
    A new command-line argument, --imbalance-factors, is introduced to control the degree of imbalance. This factor is intuitively defined as max_tokens_per_rank / average_tokens_per_rank. This allows users to easily simulate various levels of load skew.

  2. Test Flow:
    The benchmark runs a default test (with uniformed distribution) as before. It then proceeds to run several rounds of additional tests with different imbalance factors as specified.

  3. Output Format:
    The test results are summarized in tables, with each row shows the results with corresponding imbalance factor (the results with default uniform setting are shown in the first row). Each row includes the target/real imbalance factor, followed by the key metrics such as averaged, max, and min values, across all the ranks. Unlike the per-rank output in original scripts, it provides a more concise, holistic view of the system's overall performance under skewed loads.

image

@Huoyuan100861
Copy link

Very useful for optimizing imbalance research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants