Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ddm_quota with dm_weight #12243

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

d-ylee
Copy link
Contributor

@d-ylee d-ylee commented Jan 28, 2025

Also added a deprecation warning

Fixes #12055

Status

In development

Description

Changes the rseAttribute used for RSE quota from ddm_quota to dm_weight

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

N/A

External dependencies / deployment changes

N/A

@d-ylee
Copy link
Contributor Author

d-ylee commented Jan 28, 2025

I did run rucio rse list --rse-exp "ddm_quota>0" and rucio rse list --rse-exp "dm_weight>0", and the resulting lists of RSEs are the same. Using this list of RSEs that have the ddm_quota and dm_weight attributes, I wrote a script to see if the values are the same.

I have a list of 9 RSEs that do not have ddm_quota == dm_weight

@d-ylee d-ylee requested a review from amaltaro January 28, 2025 20:21
@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests no longer failing
  • Python3 Pylint check: failed
    • 29 warnings and errors that must be fixed
    • 31 warnings
    • 193 comments to review
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/312/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Contributor

@d-ylee the ddm_quota used to be an absolute attribute with the total number of bytes available for WM activities; while dm_weight is supposed to be a normalized value between 0 and 1. So even though they can return the same list of RSEs, I fear that their logic is different - including any potential logic within WMCore (especially on MSTransferor, MSOutput and MSPileup).

So, can you please verify the WMCore codebase to see (note that only ddm_quota and dm_weight are relevant for this analysis):

  • which attribute is used and where?
  • and how those attributes are used?
  • whenever ddm_quota is a default value, convert to dm_weight.

In addition, we need to scan the services_config repository to see if any ddm_quota is defined anywhere for WM services (I would remove them even if found only in comments). Note that we need to look into all three branches (test, preprod and prod).

I know this is not the review you asked for, but hopefully it gives you a better context and further material to work on ;)

@d-ylee
Copy link
Contributor Author

d-ylee commented Jan 30, 2025

@amaltaro From what I see, we are using weights from 1 to 100. A question I have is, who determines the dm_weight value, or who converts the ddm_quota bytes to the dm_weight value?

@d-ylee
Copy link
Contributor Author

d-ylee commented Jan 30, 2025

Here are the list of RSEs that have the ddm_quota and dm_weight attributes:

RSE=T0_CH_CERN_Tape, ddm_quota=14912256107064064, dm_weight=100
RSE=T1_DE_KIT_Disk, ddm_quota=39, dm_weight=39
RSE=T1_DE_KIT_Tape, ddm_quota=4508026273011683, dm_weight=3
RSE=T1_ES_PIC_Disk, ddm_quota=18, dm_weight=18
RSE=T1_ES_PIC_Tape, ddm_quota=9657682018195565, dm_weight=1
RSE=T1_FR_CCIN2P3_Disk, ddm_quota=45, dm_weight=45
RSE=T1_FR_CCIN2P3_Tape, ddm_quota=1485181323456331, dm_weight=6
RSE=T1_IT_CNAF_Disk, ddm_quota=58, dm_weight=58
RSE=T1_IT_CNAF_Tape, ddm_quota=2368393765836074, dm_weight=100
RSE=T1_RU_JINR_Disk, ddm_quota=100, dm_weight=100
RSE=T1_RU_JINR_Tape, ddm_quota=10000000000000000, dm_weight=38
RSE=T1_UK_RAL_Disk, ddm_quota=46, dm_weight=46
RSE=T1_UK_RAL_Tape, ddm_quota=7220371810192762, dm_weight=2
RSE=T1_US_FNAL_Disk, ddm_quota=15, dm_weight=15
RSE=T1_US_FNAL_Tape, ddm_quota=26144043171830112, dm_weight=100
RSE=T2_AT_Vienna, ddm_quota=9, dm_weight=9
RSE=T2_BE_IIHE, ddm_quota=12, dm_weight=12
RSE=T2_BE_UCL, ddm_quota=17, dm_weight=17
RSE=T2_BR_SPRACE, ddm_quota=24, dm_weight=24
RSE=T2_CH_CERN, ddm_quota=49, dm_weight=49
RSE=T2_CH_CSCS, ddm_quota=18, dm_weight=18
RSE=T2_CN_Beijing, ddm_quota=3, dm_weight=3
RSE=T2_DE_DESY, ddm_quota=35, dm_weight=35
RSE=T2_DE_RWTH, ddm_quota=7, dm_weight=7
RSE=T2_ES_CIEMAT, ddm_quota=28, dm_weight=28
RSE=T2_ES_IFCA, ddm_quota=10, dm_weight=10
RSE=T2_FI_HIP, ddm_quota=1, dm_weight=1
RSE=T2_FR_GRIF, ddm_quota=49, dm_weight=49
RSE=T2_HU_Budapest, ddm_quota=27, dm_weight=27
RSE=T2_IT_Bari, ddm_quota=8, dm_weight=8
RSE=T2_IT_Legnaro, ddm_quota=21, dm_weight=21
RSE=T2_IT_Pisa, ddm_quota=84, dm_weight=84
RSE=T2_KR_KISTI, ddm_quota=14, dm_weight=14
RSE=T2_PL_Swierk, ddm_quota=12, dm_weight=12
RSE=T2_PT_NCG_Lisbon, ddm_quota=7, dm_weight=7
RSE=T2_UA_KIPT, ddm_quota=9, dm_weight=9
RSE=T2_UK_London_Brunel, ddm_quota=7, dm_weight=7
RSE=T2_UK_London_IC, ddm_quota=33, dm_weight=33
RSE=T2_UK_SGrid_RALPP, ddm_quota=9, dm_weight=9
RSE=T2_US_Caltech, ddm_quota=5, dm_weight=5
RSE=T2_US_MIT_Tape, ddm_quota=28600000000000000, dm_weight=61
RSE=T2_US_Nebraska, ddm_quota=1, dm_weight=1
RSE=T2_US_Purdue, ddm_quota=3, dm_weight=3
RSE=T2_US_Vanderbilt, ddm_quota=14, dm_weight=14
RSE=T2_US_Wisconsin, ddm_quota=3, dm_weight=3

Extracted are the RSEs with differing values:

RSE=T0_CH_CERN_Tape, ddm_quota=14912256107064064, dm_weight=100
RSE=T1_DE_KIT_Tape, ddm_quota=4508026273011683, dm_weight=3
RSE=T1_ES_PIC_Tape, ddm_quota=9657682018195565, dm_weight=1
RSE=T1_FR_CCIN2P3_Tape, ddm_quota=1485181323456331, dm_weight=6
RSE=T1_IT_CNAF_Tape, ddm_quota=2368393765836074, dm_weight=100
RSE=T1_RU_JINR_Tape, ddm_quota=10000000000000000, dm_weight=38
RSE=T1_UK_RAL_Tape, ddm_quota=7220371810192762, dm_weight=2
RSE=T1_US_FNAL_Tape, ddm_quota=26144043171830112, dm_weight=100
RSE=T2_US_MIT_Tape, ddm_quota=28600000000000000, dm_weight=61

@amaltaro
Copy link
Contributor

@d-ylee the ddm_quota and dm_weight attributes are responsibility of the DM team (well, the former is soon going to be deprecated - hence this migration). So we can take it for granted and not worry about the actual value and/or if the attribute exist for the RSEs that we are interacting with.

In general, this development could have been done as simple as a switch of attribute names. However, I believe we have places where we have a logic like: is "container total size" < "RSE storage ddm_quota"? In other words, we might be using the ddm_quota as a total size of the storage and calculating whether our data will fit in a given RSE. It is better to check the WMCore codebase for that, but some suspicious places are MSTransferor, MSOutput and MSPileup.

@d-ylee
Copy link
Contributor Author

d-ylee commented Jan 31, 2025

@amaltaro From my initial investigation, here are where I think the ddm_quota was used:

MSOutput

Services.Rucio.Rucio.py pickRSE (minNeeded variable):

if rseAttribute == "ddm_quota" and attrValue > minNeeded:
rsesWithApproval.append((rse, requiresApproval))
rsesWeight.append(attrValue)
elif rseAttribute != "ddm_quota": # e.g. dm_weight
rsesWithApproval.append((rse, requiresApproval))
rsesWeight.append(attrValue)

used in MSOutput.py _getTapeDestination: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSOutput/MSOutput.py#L464

used in makeTapeSubscription L394 (dataBytesForTape):

dataBytesForTape = self._getDataVolumeForTape(workflow)
tapeRSE, requiresApproval = self._getTapeDestination(dataBytesForTape)

calculated by _getDataVolumeForTape (totalSize) L782
calcuated by the workflow’s DatasetSize (workflow[‘OutputMap’])

def _getDataVolumeForTape(self, workflow):

In MSOutput, we compare the dataSize to ddm_quota, so I think this could get affected by the change to dm_weight.

MSTransferor

The weight is used in ruleAttrs:

ruleAttrs = {'copies': copies,
'activity': 'Production Input',
'lifetime': self.msConfig['rulesLifetime'],
'account': self.msConfig['rucioAccount'],
'grouping': grouping,
'weight': self.msConfig['rucioRuleWeight'],
'meta': {'workflow_group': wflow.getWorkflowGroup()},
'comment': 'WMCore MSTransferor input data placement'}

Used in MSTransferor.py createReplicationRule:

res = self.rucio.createReplicationRule(dids, rseExpr, **ruleAttrs)

Services.Rucio.Rucio.py createReplicationRule:

def createReplicationRule(self, names, rseExpression, scope='cms', copies=1, **kwargs):

passed into rucio-clients add_replication_rule in kwargs:

response = self.cli.add_replication_rule(dids, copies, rseExpression, **kwargs)

In this case, the weight is added to the DID replication rule: https://github.com/rucio/rucio/blob/90116458e295ae828c9e05ebd6839eed1a5a1bb3/lib/rucio/client/ruleclient.py#L34-L55

MSPileup

I'm not too sure how it is used in MSPileup. Could it be in MSPileupTasks.py activeTask?

def activeTask(self, marginSpace=1024**4):

if rec['bytes_remaining'] - pileupSize > marginSpace:

@d-ylee d-ylee force-pushed the fix-12055_replace-ddm_quota-with-dm_weight branch from 886f19c to 398663b Compare February 4, 2025 19:46
@dmwm-bot
Copy link

dmwm-bot commented Feb 4, 2025

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 29 warnings and errors that must be fixed
    • 31 warnings
    • 193 comments to review
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/328/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-ylee thank you for these changes. I am almost sure that we can remove the use of minNeeded in the pickRSE() method, but let us first run this through the DM team.

A short summary of your findings are (please correct me if I make any mistake):

  • MSTransferor: input data placement is done with an intersection of RSE names according to workflow site lists, campaign, pileup, etc. Such that the actual place to put data in is decided by Rucio with the RSE attribute (now dm_weight)
  • MSOutput: we actually calculate if a Tape endpoint has enough space to fit in the total output data. I would suggest to not check it anymore, simply get a weighted RSE name (based on dm_weight attribute) and create the relevant rules
  • MSPileup: apparently none of these RSE attributes are used. It does get the account usage and remaining bytes (getAccountUsage()) to calculate if the pileup data fits in there. I would like to simplify it, but we might actually need it at some point, so perhaps we leave it around.

@haozturk @Panos512 do you think we should check if a given Tape endpoint can receive a container (i.e. check if the wmcore_output account still has enough bytes_remaining at the relevant endpoint)? I am in favor of simply picking a Tape RSE out of the weighted random choice (based on the dm_weight attribute). Thanks

src/python/WMCore/Services/Rucio/Rucio.py Outdated Show resolved Hide resolved
@dmwm-bot
Copy link

dmwm-bot commented Feb 4, 2025

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 30 warnings and errors that must be fixed
    • 32 warnings
    • 193 comments to review
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/331/artifact/artifacts/PullRequestReport.html

@d-ylee d-ylee force-pushed the fix-12055_replace-ddm_quota-with-dm_weight branch from dac145e to 6ae3947 Compare February 4, 2025 23:05
@dmwm-bot
Copy link

dmwm-bot commented Feb 4, 2025

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 29 warnings and errors that must be fixed
    • 32 warnings
    • 192 comments to review
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/332/artifact/artifacts/PullRequestReport.html

@d-ylee d-ylee requested a review from amaltaro February 4, 2025 23:20
@dmwm-bot
Copy link

dmwm-bot commented Feb 4, 2025

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: failed
    • 30 warnings and errors that must be fixed
    • 31 warnings
    • 192 comments to review
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/333/artifact/artifacts/PullRequestReport.html

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-ylee , these changes are looking good to me.
However, before merging them, I would like to hear from the DM team. I created this thread in mattermost.

Meanwhile, can you please squash the commits accordingly? Thanks

UPDATE: Please also link any configuration/external PRs in the initial PR description

@d-ylee d-ylee requested a review from amaltaro February 5, 2025 21:13
@d-ylee d-ylee force-pushed the fix-12055_replace-ddm_quota-with-dm_weight branch from 9ad8bea to 66702cf Compare February 5, 2025 21:14
@dmwm-bot
Copy link

dmwm-bot commented Feb 5, 2025

Jenkins results:

  • Python3 Unit tests: succeeded
    • 3 changes in unstable tests
  • Python3 Pylint check: failed
    • 30 warnings and errors that must be fixed
    • 31 warnings
    • 192 comments to review
  • Pycodestyle check: succeeded
    • 29 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/334/artifact/artifacts/PullRequestReport.html

@haozturk
Copy link

haozturk commented Feb 6, 2025

Hi @amaltaro @d-ylee Thank you.

  • For disk rules there will be no change, since currently we're setting the ddm_quota and dm_weight to the same value.

  • For tape rules, there won't be any change either, since you're already using dm_weight in your weighted random selection as far as I can see [1,2]. So effectively, you also stopped checking if there's enough space at a given tape, too, since that logic depends on ddm_quota attribute which is no longer used [3]

In short, as it stands this PR will not change any functionality.

If the question is whether WM should additionally check if size(dataset) < bytes_remaining_destination, I don't have a definitive answer to this. I'd suggest discussing it at

[1] https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSOutput/MSOutput.py#L465
[2] https://gitlab.cern.ch/cmsweb-k8s/services_config/-/blob/prod/reqmgr2ms-output/config-output.py?ref_type=heads#L81
[3]https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/Rucio/Rucio.py#L793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace ddm_quota attribute by dm_weight
4 participants