Skip to content

Commit

Permalink
Specify timeunit for nsys report (#100)
Browse files Browse the repository at this point in the history
Summary:
fix timeunit as ns for nsys report analysis.

Pull Request resolved: #100

Test Plan:
```
% python run.py --op rope  --mode fwd_bwd --precision fp32  --metrics nsys_rep,nsys_nvtx_range_duration --num-inputs 1 --dump-csv
  0%|                                                                                                                                                                                       | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
INFO:tritonbench.utils.triton_op:Took 270.91ms to get benchmark function for apply_rotary_pos_emb
  0%|          | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
INFO:tritonbench.utils.triton_op:Took 339.56ms to get benchmark function for apply_rotary_pos_emb
Capture range started in the application.
Capture range ended in the application.
Generating '/tmp/nsys-report-60e5.qdstrm'
[1/1] [0%                          ] nsys_output.nsys-repProcessing events...
[1/1] [========================100%] nsys_output.nsys-rep
Generated:
    /tmp/tritonbench/rope/nsys_traces/apply_rotary_pos_emb_0/nsys_output.nsys-rep
INFO:tritonbench.utils.triton_op:Took 1.65ms to get benchmark function for liger_rotary_pos_emb
  0%|          | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
INFO:tritonbench.utils.triton_op:Took 329.79ms to get benchmark function for liger_rotary_pos_emb
Capture range started in the application.
Capture range ended in the application.
Generating '/tmp/nsys-report-8910.qdstrm'
[1/1] [0%                          ] nsys_output.nsys-repProcessing events...
[1/1] [========================100%] nsys_output.nsys-rep
Generated:
    /tmp/tritonbench/rope/nsys_traces/liger_rotary_pos_emb_0/nsys_output.nsys-rep
INFO:tritonbench.utils.triton_op:Took 1180.38ms to get benchmark function for inductor_rotary_pos_emb_full_op
  0%|          | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
INFO:tritonbench.utils.triton_op:Took 1537.26ms to get benchmark function for inductor_rotary_pos_emb_full_op
Capture range started in the application.
Capture range ended in the application.
Generating '/tmp/nsys-report-21b2.qdstrm'
The target application terminated. One or more process it created re-parented.
Waiting for termination of re-parented processes.
Use the `--wait` option to modify this behavior.
[1/1] [0%                          ] nsys_output.nsys-repProcessing events...
[1/1] [========================100%] nsys_output.nsys-rep
Generated:
    /tmp/tritonbench/rope/nsys_traces/inductor_rotary_pos_emb_full_op_0/nsys_output.nsys-rep
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.73s/it]
      (H, T)    apply_rotary_pos_emb-nsys_nvtx_range_duration                                                  apply_rotary_pos_emb-nsys_rep    liger_rotary_pos_emb-nsys_nvtx_range_duration                                                  liger_rotary_pos_emb-nsys_rep    inductor_rotary_pos_emb_full_op-nsys_nvtx_range_duration                                                  inductor_rotary_pos_emb_full_op-nsys_rep
------------  -----------------------------------------------  -----------------------------------------------------------------------------  -----------------------------------------------  -----------------------------------------------------------------------------  ----------------------------------------------------------  ----------------------------------------------------------------------------------------
(8192, 1024)                                          2.12551  /tmp/tritonbench/rope/nsys_traces/apply_rotary_pos_emb_0/nsys_output.nsys-rep                                          1.15754  /tmp/tritonbench/rope/nsys_traces/liger_rotary_pos_emb_0/nsys_output.nsys-rep                                                     1.67657  /tmp/tritonbench/rope/nsys_traces/inductor_rotary_pos_emb_full_op_0/nsys_output.nsys-rep```

Reviewed By: adamomainz

Differential Revision: D66895002

Pulled By: FindHao

fbshipit-source-id: 22dc9227dd7c70e8bd88b2237bb86c7f43fb913c
  • Loading branch information
FindHao authored and facebook-github-bot committed Dec 7, 2024
1 parent e300fcf commit 62d311e
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions tritonbench/components/ncu/nsys_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def read_nsys_report(
reports_required.extend(nsys_metrics_to_reports[metric])
reports_required = list(set(reports_required))
assert reports_required, "No nsys reports required"
cmd = f"nsys stats --report {','.join(reports_required)} --force-export=true --format csv --output . --force-overwrite=true {report_path}"
cmd = f"nsys stats --report {','.join(reports_required)} --timeunit ns --force-export=true --format csv --output . --force-overwrite=true {report_path}"
try:
subprocess.check_call(
cmd.split(), stdout=subprocess.DEVNULL, stderr=subprocess.PIPE
Expand Down Expand Up @@ -70,8 +70,6 @@ def read_nsys_report(
if "nvtx_sum" in csv_contents:
# It is supposed to be only one row. The nvtx range is `:tritonbench_range`
assert len(csv_contents["nvtx_sum"]) == 1
# @TODO: nsys has a bug that the unit of nvtx range duration is ms sometimes.
# waiting for nvidia replys.
nvtx_range_duration = (
float(csv_contents["nvtx_sum"][0]["Total Time (ns)"]) / 1_000_000
)
Expand Down

0 comments on commit 62d311e

Please sign in to comment.