Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Specify timeunit for nsys report (#100)
Summary: fix timeunit as ns for nsys report analysis. Pull Request resolved: #100 Test Plan: ``` % python run.py --op rope --mode fwd_bwd --precision fp32 --metrics nsys_rep,nsys_nvtx_range_duration --num-inputs 1 --dump-csv 0%| | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46 INFO:tritonbench.utils.triton_op:Took 270.91ms to get benchmark function for apply_rotary_pos_emb 0%| | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46 INFO:tritonbench.utils.triton_op:Took 339.56ms to get benchmark function for apply_rotary_pos_emb Capture range started in the application. Capture range ended in the application. Generating '/tmp/nsys-report-60e5.qdstrm' [1/1] [0% ] nsys_output.nsys-repProcessing events... [1/1] [========================100%] nsys_output.nsys-rep Generated: /tmp/tritonbench/rope/nsys_traces/apply_rotary_pos_emb_0/nsys_output.nsys-rep INFO:tritonbench.utils.triton_op:Took 1.65ms to get benchmark function for liger_rotary_pos_emb 0%| | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46 INFO:tritonbench.utils.triton_op:Took 329.79ms to get benchmark function for liger_rotary_pos_emb Capture range started in the application. Capture range ended in the application. Generating '/tmp/nsys-report-8910.qdstrm' [1/1] [0% ] nsys_output.nsys-repProcessing events... [1/1] [========================100%] nsys_output.nsys-rep Generated: /tmp/tritonbench/rope/nsys_traces/liger_rotary_pos_emb_0/nsys_output.nsys-rep INFO:tritonbench.utils.triton_op:Took 1180.38ms to get benchmark function for inductor_rotary_pos_emb_full_op 0%| | 0/1 [00:00<?, ?it/s]`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46 INFO:tritonbench.utils.triton_op:Took 1537.26ms to get benchmark function for inductor_rotary_pos_emb_full_op Capture range started in the application. Capture range ended in the application. Generating '/tmp/nsys-report-21b2.qdstrm' The target application terminated. One or more process it created re-parented. Waiting for termination of re-parented processes. Use the `--wait` option to modify this behavior. [1/1] [0% ] nsys_output.nsys-repProcessing events... [1/1] [========================100%] nsys_output.nsys-rep Generated: /tmp/tritonbench/rope/nsys_traces/inductor_rotary_pos_emb_full_op_0/nsys_output.nsys-rep 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.73s/it] (H, T) apply_rotary_pos_emb-nsys_nvtx_range_duration apply_rotary_pos_emb-nsys_rep liger_rotary_pos_emb-nsys_nvtx_range_duration liger_rotary_pos_emb-nsys_rep inductor_rotary_pos_emb_full_op-nsys_nvtx_range_duration inductor_rotary_pos_emb_full_op-nsys_rep ------------ ----------------------------------------------- ----------------------------------------------------------------------------- ----------------------------------------------- ----------------------------------------------------------------------------- ---------------------------------------------------------- ---------------------------------------------------------------------------------------- (8192, 1024) 2.12551 /tmp/tritonbench/rope/nsys_traces/apply_rotary_pos_emb_0/nsys_output.nsys-rep 1.15754 /tmp/tritonbench/rope/nsys_traces/liger_rotary_pos_emb_0/nsys_output.nsys-rep 1.67657 /tmp/tritonbench/rope/nsys_traces/inductor_rotary_pos_emb_full_op_0/nsys_output.nsys-rep``` Reviewed By: adamomainz Differential Revision: D66895002 Pulled By: FindHao fbshipit-source-id: 22dc9227dd7c70e8bd88b2237bb86c7f43fb913c
- Loading branch information