Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WMCMSSWSubprocess and WMTiming metrics to WMArchvie document #11692

Merged
merged 3 commits into from
Aug 24, 2023

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Aug 22, 2023

Fixes #11657
Fixes the second part of #11660 and #11604

Status

ready

Description

Add WMCMSSWSubprocess and WMTiming metrics from FJR to WMArchive document

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

Complement to #11656 and #11665

External dependencies / deployment changes

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests deleted
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14433/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 65 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 56 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14434/artifact/artifacts/PullRequestReport.html

@vkuznet
Copy link
Contributor Author

vkuznet commented Aug 22, 2023

@amaltaro , this PR contains CMSSW subprocess timing metrics. Should I add to this PR CMSSW performance one or would you like to have separate PR. In later case I'll probably need to rebase my second PR since this changes may be merged. While if we'll make appropriate changes here (with proper default) then we can speed-up the process of preparing WMArchive docs for both subprocess timing and performace metrics together. Since we provide defaults we may not need to have them in FJR document per-se. But we'll be ready with both on WMArchive side.

@amaltaro
Copy link
Contributor

Valentin, what is the other PR that you are talking about?

Looking at the changes you provided in this PR, I would say it should be fixing #11657 instead of #11660, as you mentioned in the description.

We have a total of 3 GH issues involving time/cpu metrics, if possible, let's try to end up with 3 PRs closing them (I know we have one extra already with the test/json updates).

@vkuznet
Copy link
Contributor Author

vkuznet commented Aug 22, 2023

Alan, I was referring to CMSSW performance metrics which also need propagation to WMArchive, see #11663, or IO/Site metrics #11575 and related WMArchive PRs to change the schema. Since in both cases it requires modification to WMCore/Serivce/WMArchive/DataMap to present them in final JSON that's why I asked should we add them in this PR or via separate PR.

And, thanks for the catch with description, I put wrong issue indeed, now it is fixed.

@amaltaro
Copy link
Contributor

For the CMSSW I/O metrics, I would rather have it in a separate pull request fixing the relevant GH issue (#11538). I guess there is already a pull request for the FJR part.

IMO, you could make a 2nd commit in the same PR providing the relevant changes for ArchiveDataReporter/WMArchive.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuznet Valentin, please find a couple of comments along the code.

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 65 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 55 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14435/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the fix-issue-11660-wma branch from 16a08fc to 8512c61 Compare August 23, 2023 14:56
@vkuznet vkuznet changed the title Add WMCMSSWSubprocess to WMArchvie doc creation Add WMCMSSWSubprocess and WMTiming metrics to WMArchvie document Aug 23, 2023
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 65 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 51 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14436/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet force-pushed the fix-issue-11660-wma branch from 8512c61 to 7d9a152 Compare August 23, 2023 15:34
@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 2 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 1 warnings
    • 65 comments to review
  • Pylint py3k check: succeeded
  • Pycodestyle check: succeeded
    • 51 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/14437/artifact/artifacts/PullRequestReport.html

@vkuznet vkuznet requested a review from amaltaro August 23, 2023 16:44
@vkuznet
Copy link
Contributor Author

vkuznet commented Aug 23, 2023

Alan, I added WMTiming metrics into this PR and it is ready to review. I also put in separate commit (7d9a152) changes to test json files to include WMTiming and WMCMSSWSubprocess metrics. This will allow to drop this PR #11659 in favor of this one.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a second look at the DataMap module, I don't really understand what the difference is between TOP_LEVEL_STEP_DEFAULT and STEP_DEFAULT. I would say the former goes at the top level of the json, while the latter goes inside each step sub-dictionary (cmsRun1, cmsRun2, and the unneeded logArch, stageOut).

Said that, the expected behavior for these new metrics is:

  • WMTiming: should be provided only once per WMArchive document (at the job top level json report)
  • WMCMSSWSubprocess: should be provided within each step sub-dictionary of the json report (ideally only for CMSSW type, thus cmsRun1, cmsRun2, etc). Not at the top level json report, given that it's a cmsRun/CMSSW property.

To avoid bad surprises and further PR with follow up, I would suggest to take one of the FWJR provided in this PR and pass it through the createArchiveDoc method:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/WMArchive/DataMap.py#L397C5-L397C22

to ensure that the final WMArchive document is as expected.

Valentin, can you please do so and share the output file?

@amaltaro
Copy link
Contributor

Nevermind! Let us get this in, run a test and then follow up with potential further fixes.

@amaltaro amaltaro merged commit 523b400 into dmwm:master Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propagate timestamp metrics from WMCore FWJR into WMArchive and MONIT
3 participants