Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic kernel breakdown and overlap analysis #4

Open
fengxizhou opened this issue Dec 24, 2022 · 1 comment
Open

Generic kernel breakdown and overlap analysis #4

fengxizhou opened this issue Dec 24, 2022 · 1 comment
Assignees
Labels

Comments

@fengxizhou
Copy link
Contributor

fengxizhou commented Dec 24, 2022

🚀 Motivation and context

When conducting a kernel timeline analysis, we would like to breakdown the kernels into multiple groups and examine the overlap between different groups. For example, when Compute, AllToAll, and AllReduce kernels are assigned to different CUDA streams, we would like to see how these three groups are distributed along the timeline. In other words, we can split the entire execution time into the following eight groups and compute the aggregated kernel duration of each group.

  1. Idle
  2. Compute
  3. AllToAll
  4. AllReduce
  5. Compute+All2All
  6. Compute+AllReduce
  7. AllToAll+AllReduce
  8. Compute+AllToAll+AllReduce

The existing analyzers provide a hard-coded implementation of Compute-Communication and Compute-Communication-Memory breakdown. We would like to provide some utilities to help extend the breakdown and overlap analysis to support user defined kernel groups.

Description

We implement the following utilities for building flexible kernel breakdown and overlap analysis:

  • compute_kernel_overlap(df_intervals: List[pd.DataFrame], labels: List[str]) -> pd.DataFrame
    Split all intervals based on their overlaps and then merge the resulted intervals into a single sequence of intervals.

    Example:

df_intervals = [d1, d2]
labels = ["a", "b"]

d1:

ts dur
0 2
3 1
8 3

d2:

ts dur
1 2
5 2
9 4

result_df:

ts running end dur label
0 1 1 1 a
1 3 2 1 a+b
2 2 3 1 b
3 3 3 0 a+b
3 1 4 1 a
4 0 5 1 Idle
5 2 7 2 b
7 0 8 1 Idle
8 1 9 1 a
9 3 11 2 a+b
11 2 13 2 b
  • compute_overlap_statistics(df_overlap: pd.DataFrame) -> pd.DataFrame
    Summarize the labeled time intervals in compute_kernel_overlap to create a summary table as follows:

    label total_duration num_blocks max_duration ratio
    Idle 2 2 1 0.153846
    a 3 3 1 0.230769
    a+b 3 3 2 0.230769
    b 5 3 2 0.384615

Alternatives

No response

Additional context

No response

@anupambhatnagar
Copy link
Contributor

anupambhatnagar commented Jan 6, 2023

This is an interesting extension of the current feature. I have some questions about the implementation:

  1. Which class should this reside in? CommunicationAnalysis seems like a reasonable option if the kernels that will be passed to compute_kernel_overlap will only be communication kernels as mentioned in the description. What do you think?
  2. Is the user expected to provide the labels in compute_kernel_overlap? How would you handle non-communication kernels in that case?
  3. In compute_overlap_statistics what does num_blocks refer to?

Suggestion: rename compute_overlap_statistics to compute_overlap_stats and change last column to percentage rounded to two decimal places instead of ratio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants