You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When conducting a kernel timeline analysis, we would like to breakdown the kernels into multiple groups and examine the overlap between different groups. For example, when Compute, AllToAll, and AllReduce kernels are assigned to different CUDA streams, we would like to see how these three groups are distributed along the timeline. In other words, we can split the entire execution time into the following eight groups and compute the aggregated kernel duration of each group.
Idle
Compute
AllToAll
AllReduce
Compute+All2All
Compute+AllReduce
AllToAll+AllReduce
Compute+AllToAll+AllReduce
The existing analyzers provide a hard-coded implementation of Compute-Communication and Compute-Communication-Memory breakdown. We would like to provide some utilities to help extend the breakdown and overlap analysis to support user defined kernel groups.
Description
We implement the following utilities for building flexible kernel breakdown and overlap analysis:
compute_kernel_overlap(df_intervals: List[pd.DataFrame], labels: List[str]) -> pd.DataFrame
Split all intervals based on their overlaps and then merge the resulted intervals into a single sequence of intervals.
Example:
df_intervals = [d1, d2]
labels = ["a", "b"]
d1:
ts
dur
0
2
3
1
8
3
d2:
ts
dur
1
2
5
2
9
4
result_df:
ts
running
end
dur
label
0
1
1
1
a
1
3
2
1
a+b
2
2
3
1
b
3
3
3
0
a+b
3
1
4
1
a
4
0
5
1
Idle
5
2
7
2
b
7
0
8
1
Idle
8
1
9
1
a
9
3
11
2
a+b
11
2
13
2
b
compute_overlap_statistics(df_overlap: pd.DataFrame) -> pd.DataFrame
Summarize the labeled time intervals in compute_kernel_overlap to create a summary table as follows:
label
total_duration
num_blocks
max_duration
ratio
Idle
2
2
1
0.153846
a
3
3
1
0.230769
a+b
3
3
2
0.230769
b
5
3
2
0.384615
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
This is an interesting extension of the current feature. I have some questions about the implementation:
Which class should this reside in? CommunicationAnalysis seems like a reasonable option if the kernels that will be passed to compute_kernel_overlap will only be communication kernels as mentioned in the description. What do you think?
Is the user expected to provide the labels in compute_kernel_overlap? How would you handle non-communication kernels in that case?
In compute_overlap_statistics what does num_blocks refer to?
Suggestion: rename compute_overlap_statistics to compute_overlap_stats and change last column to percentage rounded to two decimal places instead of ratio.
🚀 Motivation and context
When conducting a kernel timeline analysis, we would like to breakdown the kernels into multiple groups and examine the overlap between different groups. For example, when Compute, AllToAll, and AllReduce kernels are assigned to different CUDA streams, we would like to see how these three groups are distributed along the timeline. In other words, we can split the entire execution time into the following eight groups and compute the aggregated kernel duration of each group.
The existing analyzers provide a hard-coded implementation of Compute-Communication and Compute-Communication-Memory breakdown. We would like to provide some utilities to help extend the breakdown and overlap analysis to support user defined kernel groups.
Description
We implement the following utilities for building flexible kernel breakdown and overlap analysis:
compute_kernel_overlap(df_intervals: List[pd.DataFrame], labels: List[str]) -> pd.DataFrame
Split all intervals based on their overlaps and then merge the resulted intervals into a single sequence of intervals.
Example:
df_intervals = [d1, d2]
labels = ["a", "b"]
d1:
d2:
result_df:
compute_overlap_statistics(df_overlap: pd.DataFrame) -> pd.DataFrame
Summarize the labeled time intervals in compute_kernel_overlap to create a summary table as follows:
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: