Skip to content

Update pcp for the workload metrics, so information appears as expected.#49

Merged
dvalinrh merged 5 commits into
masterfrom
fix_pcp
Jan 28, 2026
Merged

Update pcp for the workload metrics, so information appears as expected.#49
dvalinrh merged 5 commits into
masterfrom
fix_pcp

Conversation

@dvalinrh

@dvalinrh dvalinrh commented Jan 10, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes the pcp output so

  1. Data is actually present as expected
  2. Handles numa nodes.
  3. Convert csv file to use ,'s
  4. Add timestamps (phase 1. follow on work to deal with multiple iterations and looking at running 1 wh at a time for timestamp purpose

Before/After Comparison

Before:
Send result to PCP archive
Logging results nr_jvms_0 _pcp
Unexpected metric logged. Check for a typo.
Stopping PCP subset
adding: results_specjbb_virtual-guest.tar (deflated 86%)

pmrep -p -a specjbb.0 openmetrics.workloads > foo

Does not show the desired workload data.

After
pmrep -p -a specjbb_jvms_0.0 openmetrics.workload
(partial output)
o.w.iteration o.w.running o.w.numthreads o.w.runtime o.w.throughput o.w.latency o.w.Warehouse o.w.BOPs o.w.JVMs

16:06:18 0.000 1.000 0.000 NaN NaN NaN 2.000 215137.0 1.000
16:06:19 0.000 1.000 0.000 NaN NaN NaN 2.000 215137.0 1.000
16:06:20 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:21 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:22 0.000 0.000 0.000 NaN NaN NaN 4.000 269278.0 1.000
16:06:23 0.000 0.000 0.000 NaN NaN NaN 4.000 269278.0 1.000
16:06:24 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:25 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:26 0.000 0.000 0.000 NaN NaN NaN 6.000 266119.0 1.000
16:06:27 0.000 0.000 0.000 NaN NaN NaN 6.000 266119.0 1.000
16:06:28 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:29 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:30 0.000 0.000 0.000 NaN NaN NaN NaN NaN NaN
16:06:31 0.000 0.000 0.000 NaN NaN NaN 8.000 261159.0 1.000
16:06:32 0.000 0.000 0.000 NaN NaN NaN 8.000 261159.0 1.000

Clerical Stuff

This closes #48

Relates to JIRA: RPOPC-760

Test information

Command executed:
/home/ec2-user/workloads/specjbb-wrapper/specjbb/specjbb_run --run_user ec2-user --home_parent /home --iterations 1 --tuned_setting tuned_none_sys_file_ --host_config "m5a.24xlarge" --sysname "m5a.24xlarge" --sys_type aws --use_pcp --java_version 21 --debug

===============================
csv file

Single jvm
Warehouses,Bops,Numb_JVMs,Start_Date,End_Date
24,847245,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
48,1158553,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
72,1065174,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
96,961328,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
120,918485,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
144,872202,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
168,808530,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z
192,731861,1,2026-01-28T14:12:35Z,2026-01-28T14:21:03Z

multiple jvms
Warehouses,Bops,Numb_JVMs,Start_Date,End_Date
24,988377,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
48,1686562,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
72,1683347,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
96,1491024,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
120,1485539,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
144,1364024,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
168,1330200,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z
192,1302089,6,2026-01-28T14:21:43Z,2026-01-28T14:30:06Z

===============================
partial pcp output

      o.w.iteration  o.w.running  o.w.numthreads  o.w.runtime  o.w.throughput  o.w.latency  o.w.JVMs  o.w.Warehouse  o.w.BOPs

14:21:08 0.000 0.000 0.000 NaN NaN NaN 1.000 48.000 1158553
14:21:09 0.000 0.000 0.000 NaN NaN NaN 1.000 48.000 1158553
14:21:12 0.000 0.000 0.000 NaN NaN NaN 1.000 72.000 1065174
14:21:13 0.000 0.000 0.000 NaN NaN NaN 1.000 72.000 1065174
14:21:17 0.000 0.000 0.000 NaN NaN NaN 1.000 96.000 961328.0
14:21:18 0.000 0.000 0.000 NaN NaN NaN 1.000 96.000 961328.0
14:21:21 0.000 0.000 0.000 NaN NaN NaN 1.000 120.000 918485.0
14:21:22 0.000 0.000 0.000 NaN NaN NaN 1.000 120.000 918485.0
14:21:25 0.000 0.000 0.000 NaN NaN NaN 1.000 144.000 872202.0
14:21:26 0.000 0.000 0.000 NaN NaN NaN 1.000 144.000 872202.0
14:21:29 0.000 0.000 0.000 NaN NaN NaN 1.000 168.000 808530.0
14:21:30 0.000 0.000 0.000 NaN NaN NaN 1.000 168.000 808530.0
14:21:33 0.000 0.000 0.000 NaN NaN NaN 1.000 192.000 731861.0
14:21:34 0.000 0.000 0.000 NaN NaN NaN 1.000 192.000 731861.0
14:30:03 0.000 1.000 0.000 NaN NaN NaN NaN NaN NaN
14:30:04 0.000 1.000 0.000 NaN NaN NaN NaN NaN NaN
14:30:05 0.000 1.000 0.000 NaN NaN NaN NaN NaN NaN
14:30:06 0.000 1.000 0.000 NaN NaN NaN NaN NaN NaN
14:30:07 0.000 1.000 0.000 NaN NaN NaN 6.000 24.000 988377.0
14:30:08 0.000 1.000 0.000 NaN NaN NaN 6.000 24.000 988377.0
14:30:12 0.000 0.000 0.000 NaN NaN NaN 6.000 48.000 1686562
14:30:13 0.000 0.000 0.000 NaN NaN NaN 6.000 48.000 1686562
14:30:16 0.000 0.000 0.000 NaN NaN NaN 6.000 72.000 1683347
14:30:17 0.000 0.000 0.000 NaN NaN NaN 6.000 72.000 1683347
14:30:20 0.000 0.000 0.000 NaN NaN NaN 6.000 96.000 1491024
14:30:21 0.000 0.000 0.000 NaN NaN NaN 6.000 96.000 1491024
14:30:25 0.000 0.000 0.000 NaN NaN NaN 6.000 120.000 1485539
14:30:26 0.000 0.000 0.000 NaN NaN NaN 6.000 120.000 1485539
14:30:29 0.000 0.000 0.000 NaN NaN NaN 6.000 144.000 1364024
14:30:30 0.000 0.000 0.000 NaN NaN NaN 6.000 144.000 1364024
14:30:34 0.000 0.000 0.000 NaN NaN NaN 6.000 168.000 1330200
14:30:37 0.000 0.000 0.000 NaN NaN NaN 6.000 192.000 1302089
14:30:38 0.000 0.000 0.000 NaN NaN NaN 6.000 192.000 1302089
~

================================
Screen output from test

specjbb_out.txt

malucius-rh
malucius-rh previously approved these changes Jan 14, 2026

@malucius-rh malucius-rh left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@frival frival left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments / issues:

First, it looks like this will create one archive per iteration - is that the delineation we want or do we want a single archive for all iterations? A single archive would make comparing between iterations easier, but maybe there's a good reason to keep them separate?

Second, I don't see an actual log of the wrapper. Given how we swallow all output so we don't choke the ansible log when run from Zathras, I'd really like to see something like a bash -x ./specjbb_run output to look for errors or warnings we might not be seeing.

Third, I'm concerned about the pcp output. There's one entry for each warehouse count, but all of them have the throughput of the last warehouse count and the first two report the warehouse count as 8 (the last point run) and the second two report the warehouse count as 0 (which is, well, impossible). I think it will take the ability to write out our own metric archive and merge it to solve this properly since SPECjbb2005 runs as a single Java binary, so until then does it make sense to just record the peak result per run rather than have $ndatapoints entries all with the same throughput and some weirdness with the reported warehouse count?

@dvalinrh

Copy link
Copy Markdown
Contributor Author

For the output from the test, open the file https://github.com/user-attachments/files/24541236/SPECjbb.001.txt it is above in the results section. It is fairly large, did not want to clutter the git. It will download it, then open it via your choice of viewer.

We create an archive per numa node run. If it is desired we can change that to add the numa node into the pcp archive, does not matter to me which, let me know.

Cut of pcp from a 2 numa system (filtered of empty fields). Filtered too much out earlier.
numa nodes=1

      o.w.iteration  o.w.running  o.w.numthreads  o.w.runtime  o.w.throughput  o.w.latency  o.w.Warehouse  o.w.BOPs

10:51:29 0.000 1.000 0.000 NaN NaN NaN 24.000 868972.0
10:51:30 0.000 1.000 0.000 NaN NaN NaN 24.000 868972.0
10:51:37 0.000 1.000 0.000 NaN NaN NaN 48.000 1188645
10:51:38 0.000 1.000 0.000 NaN NaN NaN 48.000 1188645
10:51:45 0.000 1.000 0.000 NaN NaN NaN 72.000 1097090
10:51:46 0.000 1.000 0.000 NaN NaN NaN 72.000 1097090
10:51:54 0.000 1.000 0.000 NaN NaN NaN 96.000 958299.0
10:51:55 0.000 1.000 0.000 NaN NaN NaN 96.000 958299.0
10:52:02 0.000 1.000 0.000 NaN NaN NaN 120.000 937913.0
10:52:03 0.000 1.000 0.000 NaN NaN NaN 120.000 937913.0
10:52:10 0.000 1.000 0.000 NaN NaN NaN 144.000 782391.0
10:52:11 0.000 1.000 0.000 NaN NaN NaN 144.000 782391.0
10:52:18 0.000 1.000 0.000 NaN NaN NaN 168.000 788391.0
10:52:19 0.000 1.000 0.000 NaN NaN NaN 168.000 788391.0
10:52:26 0.000 1.000 0.000 NaN NaN NaN 192.000 741228.0
10:52:27 0.000 1.000 0.000 NaN NaN NaN 192.000 741228.0

corresponding csv file
Warehouses:Bops
24:868972
48:1188645
72:1097090
96:958299
120:937913
144:782391
168:788391
192:741228

numa nodes=6
pcp output
o.w.iteration o.w.running o.w.numthreads o.w.runtime o.w.throughput o.w.latency o.w.Warehouse o.w.BOPs
11:01:09 0.000 1.000 0.000 NaN NaN NaN 20.000 873387.0
11:01:10 0.000 1.000 0.000 NaN NaN NaN 20.000 873387.0
11:01:17 0.000 1.000 0.000 NaN NaN NaN 40.000 1311470
11:01:18 0.000 1.000 0.000 NaN NaN NaN 40.000 1311470
11:01:25 0.000 1.000 0.000 NaN NaN NaN 60.000 1351604
11:01:26 0.000 1.000 0.000 NaN NaN NaN 60.000 1351604
11:01:33 0.000 1.000 0.000 NaN NaN NaN 80.000 1251903
11:01:34 0.000 1.000 0.000 NaN NaN NaN 80.000 1251903
11:01:42 0.000 1.000 0.000 NaN NaN NaN 100.000 1137331
11:01:43 0.000 1.000 0.000 NaN NaN NaN 100.000 1137331
11:01:50 0.000 1.000 0.000 NaN NaN NaN 120.000 980861.0
11:01:51 0.000 1.000 0.000 NaN NaN NaN 120.000 980861.0
11:01:58 0.000 1.000 0.000 NaN NaN NaN 140.000 1010995
11:01:59 0.000 1.000 0.000 NaN NaN NaN 140.000 1010995
11:02:06 0.000 1.000 0.000 NaN NaN NaN 160.000 1154698
11:02:07 0.000 1.000 0.000 NaN NaN NaN 160.000 1154698

corresponding csv file
Warehouses:Bops
20:873387
40:1311470
60:1351604
80:1251903
100:1137331
120:980861
140:1010995
160:1154698

@dvalinrh dvalinrh requested a review from frival January 15, 2026 11:19
@frival

frival commented Jan 15, 2026

Copy link
Copy Markdown

For the output from the test, open the file https://github.com/user-attachments/files/24541236/SPECjbb.001.txt it is above in the results section. It is fairly large, did not want to clutter the git. It will download it, then open it via your choice of viewer.

That file is only the output of SPECjbb's view of its run (i.e. it's only the output from the SPECjbb Java executable), it's not a log of the full wrapper.

@dvalinrh

Copy link
Copy Markdown
Contributor Author

Requested output from /bin/bash -x

spec_out.txt

@dvalinrh dvalinrh requested a review from malucius-rh January 28, 2026 14:53
@dvalinrh

Copy link
Copy Markdown
Contributor Author

Now tracking numa binding. pmrep output
o.w.iteration o.w.running o.w.numthreads o.w.runtime o.w.throughput o.w.latency o.w.JVMs o.w.Warehouse o.w.BOPs o.w.Numa_bound
17:09:54 0.000 1.000 0.000 NaN NaN NaN 1.000 2.000 220483.0 1.000
17:09:55 0.000 1.000 0.000 NaN NaN NaN 1.000 2.000 220483.0 1.000

@dvalinrh

Copy link
Copy Markdown
Contributor Author

specj_out.txt
output as generated by the new -x flag from test_tools. If what you are looking for is not there, will have to figure out why it is not propagating.

@frival frival left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dvalinrh dvalinrh merged commit bea53f4 into master Jan 28, 2026
1 check passed
@dvalinrh dvalinrh deleted the fix_pcp branch March 26, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix pcp so it has meaningful labels, handles numa properly and outputs data

3 participants