Skip to content

RunTime CRAB vs WMCore

Dario Mapelli edited this page Mar 1, 2023 · 17 revisions

Intro

CRAB and WMCore both need a run time environment for running user payload in grid node. Purpose of this short doc is to write down main design points, reflecting how things are implemented in WMCore in 2022, so that we can derive how to do the same in CRAB and how to share code.

References:

Design

  • Jobs start in a singularity image prepared by the gWMS pilot
  • Jobs use the classAd REQUIRED_OS to tell gWMS which image to start
  • this is to decide rhel6 vs rhel7 etc. so is going to be el8 for all CMSSW_12 and up
  • Jobs use TARGET_ARCH to select the "hardware" e.g. x86 vs. ppc vs arm etc.
  • Jobs will find in the image the env setup by the pilot plus whatever site deemed useful/needed locally: the job start environment
  • jobs will run the WMA/CRAB wrapper scripts in the start environment AFTER having sourced the correct COMP environment for that image (i.e. OS + arch) from /cvmfs/cms.cern.ch/COMP ( wrapper environment)
  • Jobs will run the payload in the start environment AFTER having setup SCRAM environment. i.e. mimicking what a user would do:
    1. start the image
    2. source /cvmfs/cms.cern.ch/cmsset_default.sh
    3. cmsrel CMSSW_X_Y_Z
    4. cmsenv
    5. cmsRun -p pset.py -j fjr.xml Even if actual instructions varies (e.g. use of locally defined VO_CMS_SW_DIR, OSG_APP etc.) setup is assumed to be equivalent
  • If creation/manipulation of the pset is needed, it will be done using edm utils which run in an EMPTY environment + SCRAM.
  • Stageout script run in the the start environment AFTER having sourced the correct COMP environment, i.e the same environment as the job wrapper

Derived Actions

  1. have a common script to source COMP environemt [1]
  2. have a common way to fork subprocesses in either the
    1. EMPTY environment + SCRAM [2]
    2. the start environment + SCRAM env. [3]

[1] this needs to be developed (small adaptation of WMCore's submit.sh see https://github.com/dmwm/WMCore/issues/10257 )

[2] currently done using WMCore's Scram() with cleanEnv=True (the default) - ALL OK

[3a] currently done in WMCore by forking a process in the wrapper environment where the first action is unset PYTHONPATH (used for removing WMCore.zip from the python path. It is still not clear if we need to unset the pythonpath as well, since we are doing something similar ) and "the second" is cmsenv

[3b] currently done in CRAB by Scram(cleanEnv=True) + a few ad-hoc env. var. (like X509_USER_PROXY)

Needs

[1] IMHO should be done. Period.

[2] is fine

[3a] I think that unset PYTHONPATH is fragile and it would be better to have an 'unset' command for COMP which can be upgraded as needed in the future w/o touching the wrappers code (a bit like scram unsetenv)

[3b] At this point is clear that it is wrong and I think that we need to replace with something like [3a] but would prefer to have the unsetenv also to minimize/eliminate any place in CRAB where we replicate "what WMA does" instead of "using WMA code". I would also much rather have a way to customize the env in Scram() then fork a process where I do unset + cmsrel + cmsenv.. but need to hear from WMCore developers before proposing changes to Scram, see https://github.com/dmwm/WMCore/blob/eba0a315ed973616357e231976f7092adcb6b2e6/src/python/WMCore/WMRuntime/Tools/Scram.py#L328

Current Status

Latest changes: https://github.com/dmwm/CRABServer/releases/tag/v3.230220

Current CRAB status:

  • (1) CRAB and WMCore can share the script https://github.com/dmwm/CRABServer/blob/master/scripts/submit_env.sh
    • the script is ready to be shared
    • wmcore is aware that the script is ready
    • wmcore integrated this script into their code.
  • We currently run
    • The container starts with a startup environment set by SI
    • tweakThePset.sh in "scram(cleanenv=true)"
      • We run tweak the pset in the scram environment here
    • then, cmsRun in "startup env + comp + scram(cleanenv=false)"
      • We source the COMP environment to run CRAB jobwrapper, here
      • We execute cmsRun with Scram(cleanEnv=False), here. We have not encountered problems with this approach
    • then, the stageout in "startup env + COMP"
      • (details here will follow, we are about to move the call to cmscp.py)
  • (3) Dario is not sure if this actually required