-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WMCMSSWSubprocess and WMTiming metrics to WMArchvie document #11692
Conversation
Jenkins results:
|
Jenkins results:
|
@amaltaro , this PR contains CMSSW subprocess timing metrics. Should I add to this PR CMSSW performance one or would you like to have separate PR. In later case I'll probably need to rebase my second PR since this changes may be merged. While if we'll make appropriate changes here (with proper default) then we can speed-up the process of preparing WMArchive docs for both subprocess timing and performace metrics together. Since we provide defaults we may not need to have them in FJR document per-se. But we'll be ready with both on WMArchive side. |
Valentin, what is the other PR that you are talking about? Looking at the changes you provided in this PR, I would say it should be fixing #11657 instead of #11660, as you mentioned in the description. We have a total of 3 GH issues involving time/cpu metrics, if possible, let's try to end up with 3 PRs closing them (I know we have one extra already with the test/json updates). |
Alan, I was referring to CMSSW performance metrics which also need propagation to WMArchive, see #11663, or IO/Site metrics #11575 and related WMArchive PRs to change the schema. Since in both cases it requires modification to WMCore/Serivce/WMArchive/DataMap to present them in final JSON that's why I asked should we add them in this PR or via separate PR. And, thanks for the catch with description, I put wrong issue indeed, now it is fixed. |
For the CMSSW I/O metrics, I would rather have it in a separate pull request fixing the relevant GH issue (#11538). I guess there is already a pull request for the FJR part. IMO, you could make a 2nd commit in the same PR providing the relevant changes for ArchiveDataReporter/WMArchive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkuznet Valentin, please find a couple of comments along the code.
Jenkins results:
|
16a08fc
to
8512c61
Compare
Jenkins results:
|
8512c61
to
7d9a152
Compare
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a second look at the DataMap module, I don't really understand what the difference is between TOP_LEVEL_STEP_DEFAULT
and STEP_DEFAULT
. I would say the former goes at the top level of the json, while the latter goes inside each step sub-dictionary (cmsRun1, cmsRun2, and the unneeded logArch, stageOut).
Said that, the expected behavior for these new metrics is:
WMTiming
: should be provided only once per WMArchive document (at the job top level json report)WMCMSSWSubprocess
: should be provided within each step sub-dictionary of the json report (ideally only for CMSSW type, thus cmsRun1, cmsRun2, etc). Not at the top level json report, given that it's a cmsRun/CMSSW property.
To avoid bad surprises and further PR with follow up, I would suggest to take one of the FWJR provided in this PR and pass it through the createArchiveDoc
method:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/WMArchive/DataMap.py#L397C5-L397C22
to ensure that the final WMArchive document is as expected.
Valentin, can you please do so and share the output file?
Nevermind! Let us get this in, run a test and then follow up with potential further fixes. |
Fixes #11657
Fixes the second part of #11660 and #11604
Status
ready
Description
Add WMCMSSWSubprocess and WMTiming metrics from FJR to WMArchive document
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
Complement to #11656 and #11665
External dependencies / deployment changes