CPU Requirements and Time Left #7494

arrabito · 2024-03-01T11:03:28Z

arrabito
Mar 1, 2024

Dear all,

We have some issues with payloads matched even if the TimeLeft on the queue is too low.

Looking at the pilot output, I found that the TimeLeft was caomputed correctly:

2024-03-01 06:45:02 UTC WorkloadManagement/JobAgent/WorkloadManagement/JobAgent INFO: Attempting to check CPU time left for filling mode
2024-03-01 06:45:02 UTC WorkloadManagement/JobAgent/WorkloadManagement/JobAgent INFO: normalized CPU units remaining in slot 261518.40000000002

about 2.6 hours considering a CPUNormalizationFactor = 28

However a payload with:


 'JobRequirements': ['[',
                     'CPUTime',
                     '=',
                     '259200;',

was matched and then killed by the batch system when the CPU limit was reached.

I thought that CPU Time requirement was not in normalized units and thus that it corresponds in this case to 72 hours, but maybe it's normalized and thus it explains why the payload was matched.

Could you please clarify in which unit CPUTime requirement is set?

Answered by aldbr

Mar 1, 2024

Well the original idea is not bad actually, just that it should be named CPUWork instead of CPUTime 😅
Users are supposed to run their tasks on a given computer:

collect the time to completion
compute the CPU Power of their machine

To get the CPUWork of the tasks. From there, they are supposed to submit their jobs through DIRAC.

For instance, taskA would take 7200s (CPUTime) at 28 DB12 units (CPUPower) to run on WN_1: taskA needs 7200 x 28 = 201600 DB12.s to run (CPUWork).

Therefore, if a pilot has an allocation of 15000s on WN_2, a remote computing resource, which has a CPUPower of 14 DB12 units, then the DIRAC Matcher will know that it can "safely" fetch taskA because: 14400 x 14 (2100…

View full answer

aldbr · 2024-03-01T12:03:56Z

aldbr
Mar 1, 2024
Collaborator

According to my response here: #5912 (comment)

From what I understand, the "CPUTime" jdl (defined in the job) is actually CPUWork and is not multiplied with SI00/250 (again the code and documentation are not crystal clear).

I should really document that somewhere, sorry.

7 replies

arrabito Mar 1, 2024
Author

ok thank you for the explanation.
So, if we want to require 24 hours we should express it in normalized CPU units.

Thank you.

aldbr Mar 1, 2024
Collaborator

This is not very practical but yes indeed, you should express it in normalized CPU units.

arrabito Mar 1, 2024
Author

yes, it's not very practical also because the normalization factor varies from site to site, but no problem it's good to know.

aldbr Mar 1, 2024
Collaborator

Well the original idea is not bad actually, just that it should be named CPUWork instead of CPUTime 😅
Users are supposed to run their tasks on a given computer:

collect the time to completion
compute the CPU Power of their machine

To get the CPUWork of the tasks. From there, they are supposed to submit their jobs through DIRAC.

For instance, taskA would take 7200s (CPUTime) at 28 DB12 units (CPUPower) to run on WN_1: taskA needs 7200 x 28 = 201600 DB12.s to run (CPUWork).

Therefore, if a pilot has an allocation of 15000s on WN_2, a remote computing resource, which has a CPUPower of 14 DB12 units, then the DIRAC Matcher will know that it can "safely" fetch taskA because: 14400 x 14 (210000) > 7200 x 28 (201600).

Answer selected by fstagni

arrabito Mar 8, 2024
Author

I see. Thank you for the explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU Requirements and Time Left #7494

{{title}}

Replies: 1 comment 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

CPU Requirements and Time Left #7494

arrabito Mar 1, 2024

Replies: 1 comment · 7 replies

aldbr Mar 1, 2024 Collaborator

arrabito Mar 1, 2024 Author

aldbr Mar 1, 2024 Collaborator

arrabito Mar 1, 2024 Author

aldbr Mar 1, 2024 Collaborator

arrabito Mar 8, 2024 Author

arrabito
Mar 1, 2024

Replies: 1 comment 7 replies

aldbr
Mar 1, 2024
Collaborator

arrabito Mar 1, 2024
Author

aldbr Mar 1, 2024
Collaborator

arrabito Mar 1, 2024
Author

aldbr Mar 1, 2024
Collaborator

arrabito Mar 8, 2024
Author