Add imbalance factor in test_low_latency #393
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current test_low_latency script in DeepEP assumes a uniform token distribution across experts (ranks). In real workloads, however, token-to-expert routing is often skewed. To demonstrate this, we recorded token distributions from several layers of the dpsk v3 model taking the MMLU Pro dataset as input. As the figures show, the per rank token counts in individual layers (red line) vary substantially.



While load balancing mechanisms like EPLB can mitigate skew, they cannot perfectly predict future distributions, so meaningful imbalance typically persists. To evaluate DeepEP under these conditions, we propose to extend the test_low_latency script to include imbalanced loads.
We tried several statistical functions to fit the real distribution. As the following figures demonstrate, the log-normal distribution consistently provides the best approximation of the observed token imbalance(lowest Sum of Squared Errors, SSE). In addition, we also include alternative options, such as gamma and power-law distributions.
Degree of Imbalance: --imbalance-factors
A new command-line argument, --imbalance-factors, is introduced to control the degree of imbalance. This factor is intuitively defined as max_tokens_per_rank / average_tokens_per_rank. This allows users to easily simulate various levels of load skew.
Test Flow:
The benchmark runs a default test (with uniformed distribution) as before. It then proceeds to run several rounds of additional tests with different imbalance factors as specified.
Output Format:
The test results are summarized in tables, with each row shows the results with corresponding imbalance factor (the results with default uniform setting are shown in the first row). Each row includes the target/real imbalance factor, followed by the key metrics such as averaged, max, and min values, across all the ranks. Unlike the per-rank output in original scripts, it provides a more concise, holistic view of the system's overall performance under skewed loads.