Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQ2 - Resource usage #10

Closed
simleo opened this issue Jun 23, 2022 · 25 comments · Fixed by #71
Closed

CQ2 - Resource usage #10

simleo opened this issue Jun 23, 2022 · 25 comments · Fixed by #71
Labels
Requirement Something we want to capture in the spec

Comments

@simleo
Copy link
Collaborator

simleo commented Jun 23, 2022

How much memory/cpu/disk was used in run?

  • Source entity?
  • Property?
@simleo simleo added the Requirement Something we want to capture in the spec label Jul 6, 2022
@simleo simleo added this to the 0.1 milestone Sep 1, 2022
@simleo
Copy link
Collaborator Author

simleo commented Oct 13, 2022

First thing is knowing what we're going to model. What does each workflow manager already provide? What could be added that's missing, and how hard would be to do that?

More generally, we need to know about any information / logging data (and how structured it is) provided by the framework that runs the workflow, not just resource usage: timestamps, user info, containerization, etc. Another useful categorization is what's available by default and what needs to be explicitly activated (e.g., user info in cwltool).

@mr-c
Copy link

mr-c commented Oct 13, 2022

cwltool tracks peak memory usage; and start & stop tips for jobs & steps
We don't currently track peak disk usage, nor CPU time (but both could be added)

@simleo
Copy link
Collaborator Author

simleo commented Oct 14, 2022

Discussed with @ilveroluca earlier this morning. One source of confusion was my suggestion of memoryRequirements or storageRequirements for the property names, which was not consistent with the question "How much memory/cpu/disk was used in run". I have now removed that suggestion from the issue's description.

That's not to say that such indications are not useful: they are quite useful to those who want reproduce the run, since they allow to plan ahead, but they've got nothing to do with what happened during the run. Such indications don't come from the observation of a single run, but rather from the experience (or -- even better -- statistics) of the author(s) or anyone who's worked with the application in various scenarios. They are part of prospective, not retrospective provenance, so we should expect them to come from the workflow's author / maintainer. Indeed, CWL has ResourceRequirement for this purpose. I've now opened #32 to track this.

Here, instead, we are focusing on resource usage information for the specific run described in the crate, such as the peak memory usage mentioned by Michael. This isn't just useful to enrich the metadata about the run: it might be the only hint available in all cases where requirements as discussed above are not available (which I expect to be the majority: even when they are known, the authors might not take the time to provide them); additionally, with a sufficiently large number of runs, it could be used to get a good estimate of the general requirements.

@simleo
Copy link
Collaborator Author

simleo commented Oct 14, 2022

cwltool tracks peak memory usage

Where is that recorded? Is it available from the CWLProv output?

@jmfernandez
Copy link
Contributor

About Nextflow, when trace option is enabled, detailed statistics about memory and CPU usage is gathered from each step of execution. The monitoring is done from the shell script created to execute the workflow step. As this script is written in bash and it also depends on additional tools for the detailed monitoring, mainly ps, grep and awk, not all the container instances allow this detailed statistics gathering.

This is the usually gathered information for each executed step:

nextflow.trace/v2
realtime=1361
%cpu=1380
rchar=19914333
wchar=791
syscr=1252
syscw=30
read_bytes=0
write_bytes=4096
%mem=4
vmem=524224
rss=69416
peak_vmem=568880
peak_rss=112264
vol_ctxt=6
inv_ctxt=444

@suecharo
Copy link

suecharo commented Oct 17, 2022

Sapporo is a Workflow Execution System (WES), so it calls workflow engines (e.g., cwltool and Nextflow) internally.
Here, Sapporo and workflow engines are all running as Docker containers.
Because Sapporo do not use Docker API (we wanted to support other than Docker), it was difficult to get detailed resource information...

RO-Crate generated by Sapporo is stored following information (information in the Sapporo container):

  • os
  • osVersion
  • cpuArchitecture
  • cpuCount
  • totalMemory
  • freeDiskSpace
  • uid
  • git
  • inDocker

These information are generated in https://github.com/sapporo-wes/sapporo-service/blob/856196864e8ccda8c71bef12e5dcf7d5becb21e6/sapporo/ro_crate.py#L697 .
https://github.com/sapporo-wes/tonkaz/blob/a1e4c94c439d6a1d3f9947b519177196f77f1c95/tests/example_crate/trimming.json#L470 is an entity in Sapporo-generated RO-Crate.

Also, we add Log (or provenance) of WES (Sapporo).

  • cmd (Execution command of workflow engine)
  • start_time
  • end_time
  • run_pid
  • state (WES state, e.g., COMPLETE, EXECUTION_ERROR, etc.)
  • stdout (from Workflow engines)
  • stderr (from Workflow engines)

These information are originally stored in Sapporo as run_dir (https://github.com/sapporo-wes/sapporo-service#run-dir).
https://github.com/sapporo-wes/tonkaz/blob/a1e4c94c439d6a1d3f9947b519177196f77f1c95/tests/example_crate/trimming.json#L632 is an entity in Sapporo-generated RO-Crate.

Furthermore, we collect information about each files (input/output file)

  • contentSize
  • dateModified
  • uid
  • gid
  • mode
  • lineCount
  • sha512
  • encodingFormat

These information are generated in https://github.com/sapporo-wes/sapporo-service/blob/856196864e8ccda8c71bef12e5dcf7d5becb21e6/sapporo/ro_crate.py#L318 .
https://github.com/sapporo-wes/tonkaz/blob/a1e4c94c439d6a1d3f9947b519177196f77f1c95/tests/example_crate/trimming.json#L243 is an entity in Sapporo-generate RO-Crate.

@simleo
Copy link
Collaborator Author

simleo commented Oct 27, 2022

From @rsirvent:
example of basic statistics with COMPSs: https://workflowhub.eu/workflows/386 -> App_Profile.json

@rsirvent
Copy link
Contributor

rsirvent commented Oct 28, 2022

Sorry I missed this thread. Let me elaborate a bit more what we provide with COMPSs. We have some ways to gather statistics / understand resource usage:

  • Execution statistics profile (--output_profile flag): this has basic statistics on the resources used by the COMPSs application, how many tasks and of which type were executed on each resource, and maximum, average and minimum execution time for each task. An example, as provided in the previous comment: https://workflowhub.eu/workflows/386 -> App_Profile.json
  • Memory profiling (--python_memory_profile): you can see memory usage over time in PyCOMPSs applications (more details at: https://compss-doc.readthedocs.io/en/stable/Sections/04_Troubleshooting/03_Memory_Profiling.html)
  • Tracing with Paraver (see https://compss-doc.readthedocs.io/en/stable/Sections/05_Tools/03_Tracing.html): we have the option to activate tracing for an application. Then, the runtime records events, counters... down to hardware counters level. The traces can be opened with the Paraver analysis tool, where you can see in a very detailed manner the status of each thread of the application, communications, ... This is very advanced due to the level of detailed information provided, but very useful to achieve very efficient workflows that exploit the underlying infrastructure to its maximum (which is very important in supercomputers).

Also, more general information of the resources (how many cores, memory, etc... they have) can be found in COMPSs xml configuration files (resources.xml) (See: https://compss-doc.readthedocs.io/en/stable/Sections/01_Installation/06_Configuration_files.html).

Hope it helps.

@simleo
Copy link
Collaborator Author

simleo commented Jan 19, 2023

What to represent?

One of the main challenges here is providing guidelines that make sense for a wide variety of operating systems and workflow engines. This means we cannot go too deep into details. For instance, Nextflow tracing provides details such as virtual memory vs resident set, but this distinction might not apply to all systems and / or be made by all workflow engines. cwltool and Arvados, for instance, simply give "max memory used", which is sufficiently general.

The Nextflow tracing example shows that this kind of information can be very detailed; for generality and simplicity, I think we should focus on the most important bits, especially for the first release of the profiles. We could represent the following:

  • memory usage: peak memory usage. This should be real memory if possible, so one can get an idea of the RAM requirements e.g. if there is no swap space, but we don't force the representation of a specific OS concept
  • CPU usage: this is typically represented as a percentage, with values above 100 meaning that more than one computing core was used (see Nextflow -- Metrics). The actual number of CPUs where the process ran would give additional info (e.g. 100% could mean one CPU fully used or 4 CPUs with 25% usage). CPU type(s) would be another step in providing more complete info.
  • GPU usage: same as CPU usage, but for GPUs

How to represent it?

Schema.org has things like memoryRequirements for SoftwareApplication, but these are mininum requirements to run the application. We need to describe actual resource usage and tie it to the actions. I could not find anything for this in the current RO-Crate context, so we probably need new terms. I've searched for ontologies for inspiration, but could not find much (e.g. the WICUS hardware specs seems also focused on application requirements).

The simplest approach is to add the properties directly to the action, for instance:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    "memoryUsage": "8.43GB",
    "cpuUsage": "140.2%",
    "gpuUsage": "70.3%",
    "usedCpus": "2",
    "usedGpus": "1",
    ...
}

With CPU / GPU details:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    "memoryUsage": "8.43GB",
    "cpuUsage": "140.2%",
    "gpuUsage": "70.3%",
    "usedCpus": [{"@id": "#cpu-1"}, {"@id": "#cpu-2"}]
    "usedGpus": {"@id": "#gpu-1"},
    ...
},
{
    "@id": "#cpu-1",
    "@type": "HardwareComponent",
    "model": "FooBar 314Pro",
    ...
},
{
    "@id": "#cpu-1",
    "@type": "HardwareComponent",
    "model": "FooBar XG666",
    ...
}

@kinow
Copy link
Member

kinow commented Jan 19, 2023

Autosubmit (to support RO-Crate soon) keeps track of some variables like memory, CPU, disk, that would fit the model discussed so far, I think... but there are other metrics reported that I am not sure if they would fit in the resource usage here (maybe they'd be reported somewhere else in the RO-Crate archive?).

  • Energy used in workflow runs
    • when the HPC batch system provides it, like Slurm, we keep track12 of that to document the energy used for a workflow run(s) (I believe that's used for reporting in research/group reports/etc)
  • SYPD, simulated years per day
  • number of nodes used in MPI for some tasks

I think a more flexible approach, allowing for custom values to be added would be useful, from what I understood about the topic so far (still getting familiar with RO-Crate, how to implement it, etc, sorry).

-Bruno

Footnotes

  1. https://autosubmit.readthedocs.io/en/master/agui/performance/index.html

  2. https://www.bsc.es/supportkc/tech-reports/tech-report/

@simleo
Copy link
Collaborator Author

simleo commented Jan 20, 2023

I think a more flexible approach, allowing for custom values to be added would be useful

Several group members were of the same idea at yesterday's meeting, with doubts expressed on the addition of "fixed" properties that might fit poorly the descriptions given by the various engines / systems. Since this will require substantially more thought, I'm removing this issue from the 0.1 milestone.

@simleo simleo removed this from the 0.1 milestone Jan 20, 2023
@simleo
Copy link
Collaborator Author

simleo commented Jan 20, 2023

I'm removing this issue from the 0.1 milestone

What we can easily add for the 0.1 release is a recommendation to add engine-specific logs, reports, traces, etc. to the crate. They can be tied to the corresponding actions easily via about. Example:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    ...
},
{
    "@id": "trace-20230120-40360336.txt",
    "@type": "File",
    "name": "Nextflow trace for action-1",
    "conformsTo": "https://www.nextflow.io/docs/latest/tracing.html#trace-report",
    "encodingFormat": "text/tab-separated-values",
    "about": "#action-1"
}

This is vanilla RO-Crate, so it does not require adding any terms or specific requirements. Moreover, doing this requires very little effort from the crate producer. Having the information there is already quite useful; a future framework for a uniform representation of it would then be an improvement in interoperability.

@stain
Copy link
Contributor

stain commented Jan 31, 2023

Great, also declare #trace-report as in https://stain.github.io/ro-crate/1.2-DRAFT/data-entities#file-format-profiles

 {
  "@id": "https://www.nextflow.io/docs/latest/tracing.html#trace-report",
  "@type": "CreativeWork",
  "name": "Nextflow trace report CSV profile"
 }

@simleo
Copy link
Collaborator Author

simleo commented Mar 6, 2023

Now that 0.1 is out, recapping the latest discussions on the next steps, the general idea is to use a system based on key-value pairs. So this example:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    "memoryUsage": "8.43GB",
    "cpuUsage": "140.2%",
    "usedCpus": "2"
}

could become something like:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    "resourceUsage": [
        {"@id": "#action-1-memory"},
        {"@id": "#action-1-cpu"},
        {"@id": "#action-1-nCpu"},
    ]
},
{
    "@id": "#action-1-memory",
    "@type": "PropertyValue",
    "name": "memory",
    "value": "8.43GB",
},
{
    "@id": "#action-1-cpu",
    "@type": "PropertyValue",
    "name": "cpu",
    "value": "140.2%",
},
{
    "@id": "#action-1-nCpu",
    "@type": "PropertyValue",
    "name": "nCpu",
    "value": "2",
}

Note that ids like #action-1-memory are not important, they could be UUIDs: the role of the key in PropertyValue is played by name.

One advantage of this is that it requires just one extra term to be defined, resourceUsage. The other main advantage, as we discussed, is that this is extensible by RO-Crate producers, e.g., workflow engines. For instance, Autosubmit could add "energy" or "sypd", and cwltool "peakMemory". The meaning of the corresponding values would depend on the workflow engine, so consumers should be able to determine that information. Looking at the workflow language in the prospective part is not enough, since there can be multiple engines implementing the same language. We need a conformsTo, which needs a type to be attached to. For instance:

{
    "@id": "#action-1",
    "@type": "CreateAction",
    "resourceUsage": {"@id": "#action-1-ru"}
},
{
    "@id": "#action-1-ru",
    "@type": "Dataset",
    "conformsTo": "http://example.org/cwltool-rocrate-ru-spec",
    "variableMeasured": [
        {"@id": "#action-1-peakMemory"}
    ]
},
{
    "@id": "#action-1-peakMemory",
    "@type": "PropertyValue",
    "name": "peakMemory",
    "value": "8.43GB"
},
{
    "@id": "http://example.org/cwltool-rocrate-ru-spec",
    "@type": "CreativeWork",
    "name": "cwltool RO-Crate resource usage spec",
    "version": "0.1"
}

Note that I've used Dataset, which seems a very good fit since it has variableMeasured, which has PropertyValue in its range. In RO-Crate, Dataset is also used for directories, but they would not be mixed up since this Dataset would not be linked to from the root's hasPart.

Regarding keys:

  • Should we use namespaces for keys? E.g., cwl.ru.peakMemory?
  • Should we define our own "common" set of basic keys to allow some level of cross-engine crate comparison? E.g., wrroc.ru.memory. Crate producers would be free to add entries for these or not, but if added they would have be in the expected format (for instance, "memory must mean peak memory and be expressed in bytes" or "memory must mean average memory and be expressed in megabytes", etc.)

@kinow
Copy link
Member

kinow commented Mar 17, 2023

One advantage of this is that it requires just one extra term to be defined, resourceUsage. The other main advantage, as we discussed, is that this is extensible by RO-Crate producers, e.g., workflow engines. For instance, Autosubmit could add "energy" or "sypd", and cwltool "peakMemory". The meaning of the corresponding values would depend on the workflow engine, so consumers should be able to determine that information. Looking at the workflow language in the prospective part is not enough, since there can be multiple engines implementing the same language. We need a conformsTo, which needs a type to be attached to. For instance:

Sounds good! I've added a note in our merge request to add RO-Crate about testing to record the metrics somehow into the metadata, preferably with a format like this one (although I might push that to after our merge request is merged).

Should we use namespaces for keys? E.g., cwl.ru.peakMemory?
Should we define our own "common" set of basic keys to allow some level of cross-engine crate comparison? E.g., wrroc.ru.memory. Crate producers would be free to add entries for these or not, but if added they would have be in the expected format (for instance, "memory must mean peak memory and be expressed in bytes" or "memory must mean average memory and be expressed in megabytes", etc.)

I think we should do both. Use namespaces, and also have a common set of keys. But I think this common set of keys must have a really good description of each key. For instance, for cpus or cores, saying something like “number of cores used to run the workflow” can mean different things. For example, for a WMS that uses a remote batch servers like Slurm. Does cpus mean the cores that I am using to run the WMS process locally, plus the cores used in Slurm? Ditto if we have energy, total energy or just for running the remote part? If we have one for the disk space used, does it include the logs produced by the WMS to run the workflow, or just the size of the data produced by the workflow (we recently had an issue where a log generated by one of the workflows caused a short interruption of service, but we detected it retroactively), etc.

This way I would be able to use wrroc.ru.cpus or wrroc.ru.cores, but if the meaning of cpus for my WMS differed from the one in the specification, then I'd just go with something like mywms.ru.cpus.

Do you have any idea if this would be available in tools like CWL Viewer and WorkflowHub.eu? For example, if I want to search all the workflows that used mywms.ru.cpus greater than 100, could I search for something like mywms.ru.cpus >= 100.

-Bruno

@simleo
Copy link
Collaborator Author

simleo commented Mar 21, 2023

For example, for a WMS that uses a remote batch servers like Slurm. Does cpus mean the cores that I am using to run the WMS process locally, plus the cores used in Slurm?

The resource usage dataset is associated to a specific action. In a Provenance Run Crate, where there's an action for each task, one can record each task's resource usage separately. In a Workflow Run Crate you could record the sum of cores used by all tasks (in the action that represents the workflow run), but for things like usage percentages you probably want the average. In Provenance Run Crates, OTOH, if you have per-task resource usage it's probably better not to record resource usage for the whole workflow run. Resources used by the WMS itself should probably be associated to the corresponding OrganizeAction, rather than the workflow run.

Do you have any idea if this would be available in tools like CWL Viewer and WorkflowHub.eu?

They're both focused on prospective provenance, so I think it's unlikely.

@stain
Copy link
Contributor

stain commented Mar 30, 2023

I think this is almost there -- but https://schema.org/PropertyValue already have a property propertyID for this purpose, so I think that would work better than conformsTo:

{
    "@id": "#action-1-peakMemory",
    "@type": "PropertyValue",
    "name": "peakMemory",
    "propertyID": "https://example.org/cwltool-rocrate-ru-spec#peakMemory",
    "value": 8.43,
    "unitText": "GiB"
},

(I also added unitText but if these strings are coming arbitrary from some engine you may not know the unit, in which case it would be part of the value string) -- there is likewise also https://schema.org/unitCode in case the unit has an identifier.)

It would still make sense to add https://example.org/cwltool-rocrate-ru-spec (no #) to the ./ root dataset as a conformsTo profile and as a contextual entity, as it will hopefully define what those properties mean -- in the most advanced form as DefinedTerm instances, but probably just mentioned in the HTML.

Would say that propertyID would be SHOULD and unitText MAY.

@kinow
Copy link
Member

kinow commented Apr 12, 2023

A bit related, similar to what we do in Autosubmit & in our HPC in tracking energy usage, I learned today from a IIIF email about a draft for an HTTP header for carbon emission: https://www.ietf.org/archive/id/draft-martin-http-carbon-emissions-scope-2-00.html

So I believe this shows that there are groups interested in tracking resources like energy consumption, carbon emission, etc., that are different than the most common ones like memory/cpu/disk 👍

@jmfernandez
Copy link
Contributor

As of today's TC, I have realized https://schema.org/QuantitativeValue could be helpful to describe resource requirements both in prospective and retrospective provenance, due the capability to describe the value and also minimum and maximum ones

@stain
Copy link
Contributor

stain commented May 11, 2023

for unitCode we can use QUDT:

@jmfernandez
Copy link
Contributor

As @dgarijo has suggested in the TC chat, we could also allow pointing to Wikidata terms

@kinow
Copy link
Member

kinow commented Nov 23, 2023

From today's meeting: https://schema.org/Observation, something that might be useful for cases where you have metrics that are not representing computational resources like CPU or memory, but that are still directly related.


In Autosubmit we have metrics in the prospective provenance (workflow configuration) that tells us how many nodes, memory, CPU per task, etc. we will use. These resource values could exist in either an Autosubmit namespace, or in an Slurm namespace (or both). PyCOMPSs probably uses the same Slurm resources at some point, though not sure if that's available to external users before/after running the workflow.

When an Autosubmit workflow is executed, it will read the resource usage indicated in the configuration, and execute on the HPC. It may use the requested resources, or less. So we are able to get the number of resources used later (even more resources than what we specified, like the energy consumed from Slurm metrics).

But the performance of climate models is not assessed only in terms of CPU's, memory, disk used. There are metrics, like these ones from the CMIP Project (Coupled Model Intercomparison Project)

image

I will have a look to see if we can map that, from the perspective provenance traces/logs/files with the Schema.org Observation. Ideally users would be able to visualize both computational resource use, and the performance of the model (in terms of these metrics), all from reading the workflow metadata.

Thanks for the tips!!

@rsirvent
Copy link
Contributor

As promised, taking as example https://workflowhub.eu/workflows/663 :

{
 "cloud": {},
 "resources": {
  "s01r1b41-ib0": {
   "implementations": {
    "accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans": {
     "maxTime": 30,
     "executions": 10,
     "avgTime": 7,
     "minTime": 6
    },
    "initPointsFrag(INT_T,INT_T)kmeans.KMeans": {
     "maxTime": 123,
     "executions": 2,
     "avgTime": 119,
     "minTime": 116
    },
    "computeNewLocalClusters(INT_T,INT_T,OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans": {
     "maxTime": 33,
     "executions": 20,
     "avgTime": 11,
     "minTime": 9
    }
   }
  }
 },
 "implementations": {
  "computeNewLocalClusters(INT_T,INT_T,OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans": {
   "maxTime": 33,
   "executions": 20,
   "avgTime": 11,
   "minTime": 9
  },
  "initPointsFrag(INT_T,INT_T)kmeans.KMeans": {
   "maxTime": 123,
   "executions": 2,
   "avgTime": 119,
   "minTime": 116
  },
  "accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans": {
   "maxTime": 30,
   "executions": 10,
   "avgTime": 7,
   "minTime": 6
  }
 }
}

The first part is statistics per resource (s01r1b41-ib0, only one worker used in this run) and per method (accumulate, initPointsFrag and computeNewLocalClusters). And the final part is the global statistics, aggregated for all resources (in this case, it's the same, since only one worker was used).

So, as a test, the last piece:

"accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans": {
   "maxTime": 30,
   "executions": 10,
   "avgTime": 7,
   "minTime": 6
  }

Could be represented as:

{
    "@id": "#maxTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "maxTime",
    "value": 30,
    "unitText": "ms"
},
{
    "@id": "#executions_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "executions",
    "value": 10,
    "unitText": "times"
},
{
    "@id": "#avgTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "avgTime",
    "value": 7,
    "unitText": "ms"
},
{
    "@id": "#minTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "minTime",
    "value": 6,
    "unitText": "ms"
},

I don't know how to use the "propertyID" term.

Let me know how is it going so far.

@simleo
Copy link
Collaborator Author

simleo commented Nov 30, 2023

@rsirvent the propertyID entries should point to unique URLs representing resource identifiers specific to COMPSs. For instance, you could create a new compss namespace in ro-terms. Using that, and QUDT for units, the result would be:

{
    "@id": "#COMPSs_Workflow_Run_Crate_marenostrum4_SLURM_JOB_ID_30650595",
    "@type": "CreateAction",
    "resourceUsage": [
        {"@id": "#maxTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans"},
        {"@id": "#executions_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans"},
        {"@id": "#avgTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans"},
        {"@id": "#minTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans"}
    ],
    ...
},
{
    "@id": "#maxTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "maxTime",
    "propertyID": "https://w3id.org/ro/terms/compss#maxTime",
    "unitCode": "https://qudt.org/vocab/unit/MilliSEC",
    "value": "30"
},
{
    "@id": "#executions_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "executions",
    "propertyID": "https://w3id.org/ro/terms/compss#executions",
    "value": "10"
},
{
    "@id": "#avgTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "avgTime",
    "propertyID": "https://w3id.org/ro/terms/compss#avgTime",
    "value": "7",
    "unitCode": "https://qudt.org/vocab/unit/MilliSEC"
},
{
    "@id": "#minTime_accumulate(OBJECT_T,OBJECT_T,OBJECT_T,OBJECT_T)kmeans.KMeans",
    "@type": "PropertyValue",
    "name": "minTime",
    "propertyID": "https://w3id.org/ro/terms/compss#minTime",
    "value": "6",
    "unitCode": "https://qudt.org/vocab/unit/MilliSEC"
}

Note that executions does not have a unitCode since it's adimensional (the same would happen for percentages).

@stain
Copy link
Contributor

stain commented Mar 14, 2024

#60 was merged as nextflow example.

Need to formulate some text for the pages on how to do this with PropertyValue - documenting propertyID and unitCode

This relates more to provenance ro-crate as it probably better to describe a particular prcoess. But some statistics could make sense overall as well as for the engine execution, but could otherwise be difficult to aggregate at workflow level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Requirement Something we want to capture in the spec
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants