Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUPTI Counter Analysis empty #146

Open
jeromeku opened this issue May 28, 2024 · 1 comment
Open

CUPTI Counter Analysis empty #146

jeromeku opened this issue May 28, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jeromeku
Copy link

jeromeku commented May 28, 2024

🐛 Describe the bug

Running the CUPTI demo results in an empty trace.

Running the example script with KINETO_LOG_LEVEL=1:

INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:225] CUDA versions. CUPTI: 18; Runtime: 12010; Driver: 12020
  Log file: /tmp/libkineto_activities_3065.json
  Trace start time: 2024-05-28 21:01:42  Trace duration: 500ms
  Warmup duration: 0s
  Max GPU buffer size: 128MB
  Enabled activities: cuda_profiler_range
Cupti Profiler metrics : kineto__tensor_core_insts, dram__bytes_read.sum, dram__bytes_write.sum
Cupti Profiler measure per kernel : 0
Cupti Profiler max ranges : 10
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:834] Enabling GPU tracing
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:873] Tracing starting in 1s
INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:878] Tracing will end in 2s
STAGE:2024-05-28 21:01:40 3065:3065 ActivityProfilerController.cpp:314] Completed Stage: Warm Up
STAGE:2024-05-28 21:01:41 3065:3065 ActivityProfilerController.cpp:320] Completed Stage: Collection
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:232] Processing 1 CPU buffers
WARNING:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:312] CPU trace is empty!
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:294] Record counts: Out-of-range = 0, Blocklisted runtime = 0, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1060] Traces Recorded:
INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1063] PyTorch Profiler: 1 iterations
STAGE:2024-05-28 21:01:41 3065:3065 ActivityProfilerController.cpp:324] Completed Stage: Post Processing
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:90] Tracing to temporary file /tmp/tmpe3a6bubf.json
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:438] Chrome Trace written to /tmp/tmpe3a6bubf.json
INFO:2024-05-28 21:01:41 3065:3065 output_json.cpp:486] Renamed the trace file to /tmp/tmpe3a6bubf.json

Trace file:


{
  "schemaVersion": 1,
  "deviceProperties": [
    {
      "id": 0, "name": "NVIDIA GeForce RTX 3090", "totalGlobalMem": 25438126080,
      "computeMajor": 8, "computeMinor": 6,
      "maxThreadsPerBlock": 1024, "maxThreadsPerMultiprocessor": 1536,
      "regsPerBlock": 65536, "regsPerMultiprocessor": 65536, "warpSize": 32,
      "sharedMemPerBlock": 49152, "sharedMemPerMultiprocessor": 102400,
      "numSms": 82, "sharedMemPerBlockOptin": 101376
    }
  ],
  "record_shapes": 1,
  "traceEvents": [
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
    "args": {
      "labels": "CPU"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 3065, "tid": 0,
    "args": {
      "sort_index": 3065
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
    "args": {
      "labels": "GPU 0"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 0, "tid": 0,
    "args": {
      "sort_index": 16777216
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
    "args": {
      "labels": "GPU 1"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 1, "tid": 0,
    "args": {
      "sort_index": 16777217
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
    "args": {
      "labels": "GPU 2"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 2, "tid": 0,
    "args": {
      "sort_index": 16777218
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
    "args": {
      "labels": "GPU 3"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 3, "tid": 0,
    "args": {
      "sort_index": 16777219
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
    "args": {
      "labels": "GPU 4"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 4, "tid": 0,
    "args": {
      "sort_index": 16777220
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
    "args": {
      "labels": "GPU 5"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 5, "tid": 0,
    "args": {
      "sort_index": 16777221
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
    "args": {
      "labels": "GPU 6"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 6, "tid": 0,
    "args": {
      "sort_index": 16777222
    }
  },
  {
    "name": "process_name", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
    "args": {
      "name": "python"
    }
  },
  {
    "name": "process_labels", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
    "args": {
      "labels": "GPU 7"
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945163, "pid": 7, "tid": 0,
    "args": {
      "sort_index": 16777223
    }
  },
  {
    "ph": "X", "cat": "Trace", "ts": 1716930100945087, "dur": 158814,
    "pid": "Spans", "tid": "PyTorch Profiler",
    "name": "PyTorch Profiler (0)",
    "args": {
      "Op count": 0
    }
  },
  {
    "name": "process_sort_index", "ph": "M", "ts": 1716930100945087,
    "pid": "Spans", "tid": 0,
    "args": {
      "sort_index": 536870912
    }
  },
  {
    "name": "Iteration Start: PyTorch Profiler", "ph": "i", "s": "g",
    "pid": "Traces", "tid": "Trace PyTorch Profiler", "ts": 1716930100945087
  },
  {
    "name": "Record Window End", "ph": "i", "s": "g",
    "pid": "", "tid": "", "ts": 1716930101104050
  }
  ],
  "INFO": ["INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:834] Enabling GPU tracing","INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:873] Tracing starting in 1s","INFO:2024-05-28 21:01:40 3065:3065 CuptiActivityProfiler.cpp:878] Tracing will end in 2s","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:232] Processing 1 CPU buffers","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:294] Record counts: Out-of-range = 0, Blocklisted runtime = 0, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1060] Traces Recorded:","INFO:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:1063] PyTorch Profiler: 1 iterations"],
  "WARNING": ["WARNING:2024-05-28 21:01:41 3065:3065 CuptiActivityProfiler.cpp:312] CPU trace is empty!"],
  "traceName": "/tmp/tmpe3a6bubf.json"
}

Steps to reproduce

python cupti_profiler_demo.py

Expected behavior

Expected exported trace to have cuda_profiler_range and counter data.

Environment

torch: 2.3.0
HTA: 0.2.0
python: 3.11.9
OS:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

Additional Info

@briancoutinho

@jeromeku jeromeku added bug Something isn't working needs triage labels May 28, 2024
@briancoutinho
Copy link
Contributor

@jeromeku Thanks for filing. Actually, the CUPTI range profiler is not working in open source due to a build limitation. I had a PR that enable the profiler and unfortunately it got reverted (broke pytorch nightly).
pytorch/pytorch#125685

Tracked in issue here and it also has other advise on potential issues to collect range profiler
pytorch/pytorch#125685

You can try using my PR and building from source (a bit painful i know) hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants