Skip to content

enh(hotspot_analyzer): add --kernel filter for CSV metadata matching#657

Open
Arist12 wants to merge 2 commits into
ROCm:mainfrom
Arist12:enh/hotspot-kernel-filter
Open

enh(hotspot_analyzer): add --kernel filter for CSV metadata matching#657
Arist12 wants to merge 2 commits into
ROCm:mainfrom
Arist12:enh/hotspot-kernel-filter

Conversation

@Arist12

@Arist12 Arist12 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Problem

hotspot_analyzer.py reads authoritative VGPR/SGPR/LDS/occupancy data from
the *_kernel_trace.csv file written by rocprofv3 --kernel-trace. To
select the correct row it tries to match Kernel_Name against the dispatch
directory basename.

This heuristic works for timestamped output directories
(20240101_120000_pa_decode_kernel) but fails completely for the
ui_output_agent_<N>_dispatch_<id> layout produced by rocprofv3's ATT
decode step. In that layout the directory basename carries only an agent
number and a dispatch counter — no kernel name — so every kernel name
comparison returns false and the metadata lookup silently returns {}.

The result is that the "Register Pressure & Occupancy" section uses ISA
estimates instead of the real CSV values for all ui_output_agent_* traces,
and the warning message gave no hint about how to fix it.

Solution

Add --kernel SUBSTR (optional, default ""):

  • When provided, uses a direct substring match on Kernel_Name instead of
    the dir-name heuristic.
  • If the *_kernel_trace.csv has a Dispatch_Id column and the
    directory name encodes dispatch_<id>, the row must also match on dispatch
    id. This prevents false matches when a PyTorch reference kernel shares the
    same name prefix as the target kernel and runs in the same profiling session.
  • Falls back to kernel-name-only substring matching when the CSV has no
    Dispatch_Id column.

The legacy heuristic (dir basename vs Kernel_Name bidirectional substring) is
unchanged and still used when --kernel is not given, so existing
timestamped-dir workflows are unaffected.

The "not matched" warning now mentions --kernel so users can discover the
fix without reading source.

Before / after

# Before — metadata not loaded for ui_output_agent_* dirs
python hotspot_analyzer.py ui_output_agent_15249_dispatch_223 --topk 4 --mode src
# (kernel_trace CSV not matched — accum/LDS/SGPR estimated from ISA only)

# After — CSV metadata loaded correctly
python hotspot_analyzer.py ui_output_agent_15249_dispatch_223 \
    --topk 4 --mode src \
    --kernel pa_mqa_logits_fp4_kernel_0
# Prints real VGPR/SGPR/LDS/occupancy from out_kernel_trace.csv

Testing

Five unit tests covering:

  1. Legacy timestamp heuristic still works (no regression).
  2. ui_output_agent_* dir without --kernel returns {} (expected).
  3. --kernel + Dispatch_Id column selects the correct CSV row.
  4. --kernel without Dispatch_Id column falls back to name-only match.
  5. argparse wires --kernel through to read_kernel_metadata.

All five pass.

Arist12 added 2 commits June 4, 2026 15:57
The existing CSV row-selection heuristic matches by comparing the dispatch
directory basename against Kernel_Name in the kernel trace CSV.  This works
for rocprofv3's timestamped output (e.g. 20240101_120000_pa_decode_kernel),
but fails completely for the ui_output_agent_<N>_dispatch_<id> layout
produced by rocprofv3's ATT decode step — the basename carries no kernel
name, only agent and dispatch numbers.

When metadata lookup fails the analyzer falls back to ISA-estimated register
counts and prints a warning, silently under-reporting VGPR, SGPR, LDS, and
occupancy for every ui_output_agent_* trace.

Fix by adding a --kernel SUBSTR option that enables an explicit row-selection
path:
  1. Substrings-matches Kernel_Name against the supplied filter.
  2. If the CSV has a Dispatch_Id column and the directory name encodes
     dispatch_<id>, also requires the row's Dispatch_Id to match — avoiding
     false matches when a PyTorch reference kernel shares the same name prefix.
  3. Falls back gracefully to kernel-name-only matching when Dispatch_Id is
     absent from the CSV.

The legacy heuristic is unchanged and still used when --kernel is not given,
so existing timestamped-dir workflows are unaffected.

Update the "not matched" warning to tell users about --kernel so the fix is
discoverable without reading source.

Example:
    python hotspot_analyzer.py ui_output_agent_15249_dispatch_223 \
        --topk 8 --mode src --detail \
        --kernel pa_mqa_logits_fp4_kernel_0
@Arist12 Arist12 marked this pull request as ready for review June 11, 2026 02:58
@coderfeli coderfeli requested a review from fsx950223 June 15, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant