Generic kernel breakdown and overlap analysis #4

fengxizhou · 2022-12-24T04:38:20Z

🚀 Motivation and context

When conducting a kernel timeline analysis, we would like to breakdown the kernels into multiple groups and examine the overlap between different groups. For example, when Compute, AllToAll, and AllReduce kernels are assigned to different CUDA streams, we would like to see how these three groups are distributed along the timeline. In other words, we can split the entire execution time into the following eight groups and compute the aggregated kernel duration of each group.

Idle
Compute
AllToAll
AllReduce
Compute+All2All
Compute+AllReduce
AllToAll+AllReduce
Compute+AllToAll+AllReduce

The existing analyzers provide a hard-coded implementation of Compute-Communication and Compute-Communication-Memory breakdown. We would like to provide some utilities to help extend the breakdown and overlap analysis to support user defined kernel groups.

Description

We implement the following utilities for building flexible kernel breakdown and overlap analysis:

compute_kernel_overlap(df_intervals: List[pd.DataFrame], labels: List[str]) -> pd.DataFrame
Split all intervals based on their overlaps and then merge the resulted intervals into a single sequence of intervals.

Example:

df_intervals = [d1, d2]
labels = ["a", "b"]

d1:

ts	dur
0	2
3	1
8	3

d2:

ts	dur
1	2
5	2
9	4

result_df:

ts	running	end	dur	label
0	1	1	1	a
1	3	2	1	a+b
2	2	3	1	b
3	3	3	0	a+b
3	1	4	1	a
4	0	5	1	Idle
5	2	7	2	b
7	0	8	1	Idle
8	1	9	1	a
9	3	11	2	a+b
11	2	13	2	b

compute_overlap_statistics(df_overlap: pd.DataFrame) -> pd.DataFrame
Summarize the labeled time intervals in compute_kernel_overlap to create a summary table as follows:

label total_duration num_blocks max_duration ratio

Idle 2 2 1 0.153846

a 3 3 1 0.230769

a+b 3 3 2 0.230769

b 5 3 2 0.384615

Alternatives

No response

Additional context

No response

anupambhatnagar · 2023-01-06T06:50:16Z

This is an interesting extension of the current feature. I have some questions about the implementation:

Which class should this reside in? CommunicationAnalysis seems like a reasonable option if the kernels that will be passed to compute_kernel_overlap will only be communication kernels as mentioned in the description. What do you think?
Is the user expected to provide the labels in compute_kernel_overlap? How would you handle non-communication kernels in that case?
In compute_overlap_statistics what does num_blocks refer to?

Suggestion: rename compute_overlap_statistics to compute_overlap_stats and change last column to percentage rounded to two decimal places instead of ratio.

fengxizhou added feature request New feature request needs triage labels Dec 24, 2022

anupambhatnagar assigned fengxizhou Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic kernel breakdown and overlap analysis #4

Generic kernel breakdown and overlap analysis #4

fengxizhou commented Dec 24, 2022 •

edited

Loading

anupambhatnagar commented Jan 6, 2023 •

edited

Loading

Generic kernel breakdown and overlap analysis #4

Generic kernel breakdown and overlap analysis #4

Comments

fengxizhou commented Dec 24, 2022 • edited Loading

🚀 Motivation and context

Description

Alternatives

Additional context

anupambhatnagar commented Jan 6, 2023 • edited Loading

fengxizhou commented Dec 24, 2022 •

edited

Loading

anupambhatnagar commented Jan 6, 2023 •

edited

Loading