Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource usage example #60

Merged
merged 3 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions dev/resource_usage/nf_tracing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Representing resource usage: Nextflow tracing example

The `tutorial.nf` workflow used for this example has been copied verbatim from the [Get started section of the Nextflow documentation](https://github.com/nextflow-io/nextflow/blob/f48a473c070f297bce6f97f6b076e4e92d25e00a/docs/getstarted.md), which is © Copyright 2023, Seqera Labs, S.L. and [distributed under](https://github.com/nextflow-io/nextflow/blob/f48a473c070f297bce6f97f6b076e4e92d25e00a/docs/README.md#license) the [CC BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/).

The purpose of this exercise is to try a representation of [resource usage](https://github.com/ResearchObject/workflow-run-crate/issues/10) in a Workflow Run RO-Crate.

The tutorial has been run as follows:

```console
bash-4.2# date
Wed May 17 14:33:13 UTC 2023

bash-4.2# nextflow -v
nextflow version 23.05.0-edge.5861

bash-4.2# nextflow run tutorial.nf
N E X T F L O W ~ version 23.05.0-edge
Launching `tutorial.nf` [lonely_dubinsky] DSL2 - revision: e61bd183fe
executor > local (3)
[cd/ca5a2f] process > splitLetters [100%] 1 of 1 ✔
[9f/7c259b] process > convertToUpper (1) [100%] 2 of 2 ✔
WORLD!
HELLO
```

With the following configuration:

```yaml
trace {
enabled = true
raw = true
}
```

Producing the `trace-20230517-52413675.txt` trace report.

To generate the RO-Crate:

```
pip install -r requirements.txt
python make_crate.py tutorial-run-1-crate
```
176 changes: 176 additions & 0 deletions dev/resource_usage/nf_tracing/make_crate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Copyright 2023 CRS4.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import csv
from datetime import datetime, timedelta
from pathlib import Path


from rocrate.rocrate import ROCrate
from rocrate.model.contextentity import ContextEntity


THIS_DIR = Path(__file__).absolute().parent
WF_PATH = THIS_DIR / "tutorial.nf"
TRACE_PATH = THIS_DIR / "trace-20230517-52413675.txt"
CONFIG_PATH = THIS_DIR / "nextflow.config"
WROC_PROFILE_VERSION = "1.0"
PROFILES_BASE = "https://w3id.org/ro/wfrun"
PROFILES_VERSION = "0.1"


EXTRA_TERMS = {
"resourceUsage": "https://w3id.org/ro/terms/workflow-run#resourceUsage",
}
NF_TRACE_NS = "https://w3id.org/ro/terms/nf-trace"


def add_profiles(crate):
profiles = []
for p in "process", "workflow", "provenance":
id_ = f"{PROFILES_BASE}/{p}/{PROFILES_VERSION}"
profiles.append(crate.add(ContextEntity(crate, id_, properties={
"@type": "CreativeWork",
"name": f"{p.title()} Run Crate",
"version": PROFILES_VERSION,
})))
wroc_profile_id = f"https://w3id.org/workflowhub/workflow-ro-crate/{WROC_PROFILE_VERSION}"
profiles.append(crate.add(ContextEntity(crate, wroc_profile_id, properties={
"@type": "CreativeWork",
"name": "Workflow RO-Crate",
"version": WROC_PROFILE_VERSION,
})))
crate.root_dataset["conformsTo"] = profiles


def add_tasks(crate):
workflow = crate.mainEntity
with open(TRACE_PATH) as f:
reader = csv.DictReader(f, delimiter="\t")
for record in reader:
action_id = "#" + record["hash"]
action_name = record["name"]
tool_name = action_name.split()[0]
start = datetime.fromtimestamp(float(record["submit"])/1000)
end = start + timedelta(milliseconds=int(record["duration"]))
tool_id = f"{workflow.id}#{tool_name}"
tool = crate.get(tool_id)
if not tool:
tool = crate.add(ContextEntity(crate, tool_id, properties={
"@type": "SoftwareApplication",
"name": tool_name
}))
workflow.append_to("hasPart", tool)
action = crate.add(ContextEntity(crate, action_id, properties={
"@type": "CreateAction",
"name": action_name,
}))
action["instrument"] = tool
action["startTime"] = start.isoformat()
action["endTime"] = end.isoformat()
resource_usage = []
stat = "realTime"
resource_usage.append(
crate.add(ContextEntity(crate, f"{action_id}-{stat}", properties={
"@type": "PropertyValue",
"name": stat,
"propertyID": f"{NF_TRACE_NS}#{stat}",
"value": str(record["realtime"]),
"unitCode": "https://qudt.org/vocab/unit/MilliSEC",
}))
)
stat = "percentCPU"
resource_usage.append(
crate.add(ContextEntity(crate, f"{action_id}-{stat}", properties={
"@type": "PropertyValue",
"name": stat,
"propertyID": f"{NF_TRACE_NS}#{stat}",
"value": str(record["%cpu"])
}))
)
stat = "peakRSS"
resource_usage.append(
crate.add(ContextEntity(crate, f"{action_id}-{stat}", properties={
"@type": "PropertyValue",
"name": stat,
"propertyID": f"{NF_TRACE_NS}#{stat}",
"value": str(record["peak_rss"]),
"unitCode": "https://qudt.org/vocab/unit/BYTE",
}))
)
stat = "peakVMEM"
resource_usage.append(
crate.add(ContextEntity(crate, f"{action_id}-{stat}", properties={
"@type": "PropertyValue",
"name": stat,
"propertyID": f"{NF_TRACE_NS}#{stat}",
"value": str(record["peak_vmem"]),
"unitCode": "https://qudt.org/vocab/unit/BYTE",
}))
)
stat = "rChar"
resource_usage.append(
crate.add(ContextEntity(crate, f"{action_id}-{stat}", properties={
"@type": "PropertyValue",
"name": stat,
"propertyID": f"{NF_TRACE_NS}#{stat}",
"value": str(record["rchar"]),
"unitCode": "https://qudt.org/vocab/unit/BYTE",
}))
)
stat = "wChar"
resource_usage.append(
crate.add(ContextEntity(crate, f"{action_id}-{stat}", properties={
"@type": "PropertyValue",
"name": stat,
"propertyID": f"{NF_TRACE_NS}#{stat}",
"value": str(record["wchar"]),
"unitCode": "https://qudt.org/vocab/unit/BYTE",
}))
)
action["resourceUsage"] = resource_usage


def main(args):
crate = ROCrate(gen_preview=False)
crate.metadata.extra_terms.update(EXTRA_TERMS)
crate.root_dataset["license"] = "https://creativecommons.org/licenses/by-sa/4.0/"
add_profiles(crate)
wf_properties = {
"@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow", "HowTo"],
}
workflow = crate.add_workflow(
WF_PATH, WF_PATH.name, main=True, lang="nextflow",
gen_cwl=False, properties=wf_properties
)
crate.add_file(CONFIG_PATH, properties={
"description": "Nextflow configuration file"
})
crate.add_file(THIS_DIR / "README.md")
add_tasks(crate)
action = crate.add(ContextEntity(crate, properties={
"@type": "CreateAction",
"name": f"Execution of {WF_PATH.name}",
}))
action["instrument"] = workflow
crate.root_dataset["mentions"] = [action]
crate.write(args.out_dir)


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("out_dir", metavar="OUTPUT_DIRECTORY",
help="output directory for the crate")
main(parser.parse_args())
4 changes: 4 additions & 0 deletions dev/resource_usage/nf_tracing/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
trace {
enabled = true
raw = true
}
1 change: 1 addition & 0 deletions dev/resource_usage/nf_tracing/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
rocrate~=0.7
2 changes: 2 additions & 0 deletions dev/resource_usage/nf_tracing/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[flake8]
max-line-length = 127
4 changes: 4 additions & 0 deletions dev/resource_usage/nf_tracing/trace-20230517-52413675.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
1 cd/ca5a2f 1273 splitLetters COMPLETED 0 1684334014290 178 5 66.7 0 0 145813 223
3 28/7f2737 1350 convertToUpper (2) COMPLETED 0 1684334014534 186 12 80.0 0 0 155925 199
2 9f/7c259b 1353 convertToUpper (1) COMPLETED 0 1684334014542 184 9 133.3 0 0 155926 200
42 changes: 42 additions & 0 deletions dev/resource_usage/nf_tracing/tutorial-run-1-crate/README.md
Copy link
Member

@kinow kinow Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks a lot like dev/resource_usage/nf_tracing/README.md (from looking at the GitHub UI). Are they the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a line in the script that adds it to the crate.

Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Representing resource usage: Nextflow tracing example

The `tutorial.nf` workflow used for this example has been copied verbatim from the [Get started section of the Nextflow documentation](https://github.com/nextflow-io/nextflow/blob/f48a473c070f297bce6f97f6b076e4e92d25e00a/docs/getstarted.md), which is © Copyright 2023, Seqera Labs, S.L. and [distributed under](https://github.com/nextflow-io/nextflow/blob/f48a473c070f297bce6f97f6b076e4e92d25e00a/docs/README.md#license) the [CC BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/).

The purpose of this exercise is to try a representation of [resource usage](https://github.com/ResearchObject/workflow-run-crate/issues/10) in a Workflow Run RO-Crate.

The tutorial has been run as follows:

```console
bash-4.2# date
Wed May 17 14:33:13 UTC 2023

bash-4.2# nextflow -v
nextflow version 23.05.0-edge.5861

bash-4.2# nextflow run tutorial.nf
N E X T F L O W ~ version 23.05.0-edge
Launching `tutorial.nf` [lonely_dubinsky] DSL2 - revision: e61bd183fe
executor > local (3)
[cd/ca5a2f] process > splitLetters [100%] 1 of 1 ✔
[9f/7c259b] process > convertToUpper (1) [100%] 2 of 2 ✔
WORLD!
HELLO
```

With the following configuration:

```yaml
trace {
enabled = true
raw = true
}
```

Producing the `trace-20230517-52413675.txt` trace report.

To generate the RO-Crate:

```
pip install -r requirements.txt
python make_crate.py tutorial-run-1-crate
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
trace {
enabled = true
raw = true
}
Loading