-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nsys report analyzer #65
Conversation
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
reports_required.extend(nsys_metrics_to_reports[metric]) | ||
reports_required = list(set(reports_required)) | ||
assert reports_required, "No nsys reports required" | ||
cmd = f"nsys stats --report {','.join(reports_required)} --force-export=true --format csv --output . --force-overwrite=true {report_path} > /dev/null 2>&1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple ways to obtain reports and collect data from them. 1. run nsys with desired reports via --report
. It will automatically generate the csv files for each report. 2. only generate sqlite and utilize report scripts in nsys/target-linux-x64/reports
. See discussion here.
In summary, we always need one system command call for nsys to either generate sqlite or csv reports, and the report scripts in nsys folder also do csv process, so we can directly utilize nsys to obtain csv report files rather than processing by ourselves.
Some results are not correct. working on the fix. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
if "nvtx_sum" in csv_contents: | ||
# It is supposed to be only one row. The nvtx range is `:tritonbench_range` | ||
assert len(csv_contents["nvtx_sum"]) == 1 | ||
nvtx_range_duration = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nsys returns this number in us
rather than ns
in some cases. it's not our bug. left this comment for future check.
c3305e9
to
af15f80
Compare
e25da02
to
ab36ba4
Compare
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR add a nsys report analyzer providing metrics
nsys_gpu_kernel_sum
is the sum of total GPU kernel execution time on GPUs, thensys_nvtx_range_duration
is the total execution time of the operator, and thensys_launch_overhead
is their difference which indicates the launch overhead. This is one way to measure execution time mentioned in #50Fix #67
Test Plan: