ro-crate #2

kmexter · 2024-10-01T13:46:03Z

I went over this file: https://github.com/emo-bon/metaGOflow-data-products-RO-crate-example/blob/main/ro-crate-metadata.json and have the following comments based on a comparison to our ARMS ro-crates
https://github.com/arms-mbon/data_release_001/blob/main/ro-crate-metadata.json and https://github.com/arms-mbon/analysis_release_001/blob/main/ro-crate-metadata.json

add a Creator -> that could be embrc or could be metagoflow team - done
add a publisher -> vliz, as we did with arms data_release_001 - done
add wasAssociatedWith -> people involved in creating the dataset (e.g. the bioinformatian) - done
add a specific contactPoint -> is the [email protected] really the best email address? - done
the licence should be given as "license" - done
the associated sequence (that will be eventually the ENA run accession number) should get a predicate that say that it is an associatedSequence. We will need to find such a predicate: wasDerivedFrom or similar combined with a property saying that it is a sequence? @laurianvm ? - under discussion issue 7
the other files listed in the ro-create: where they were produced by metagotflow then need "wasInfluencedBy" and then refer to the GH repo of that file (or that instance of the file - need the version also). Also to discuss wtih @laurianvm but use as input the links given above - under discussion issue 7
need a downloadURL for each file - done
should have "keywords" for the repo - done
my arms rocrate has a "label" fo reach one, but here that is "name" - which is better? @cedricdcc ? - ignore
at repo level add some dct:relation to link to other repos that should be considered with this one ...would need to think what they would be
need to add material sample id (or uuid) I think as being a "sample" and will need laurian's input here as what the predicate and object would be

If you dont understand all of this @cymon (I have written in a bit of a hurry) then perhaps the best is that @laurianvm and I made a template and you fill it in?

laurianvm · 2024-10-16T11:58:59Z

(@laurianvm to go through and give examples on how to
(including how to describe the institutes that have run the code, relating to codemeta template example))

cymon · 2024-10-16T15:19:57Z

I went over this file: https://github.com/emo-bon/metaGOflow-data-products-RO-crate-example/blob/main/ro-crate-metadata.json

OK, that file is an example of the of the metadata.json file generated by the create-ro-crate.py script, the template that script uses is here

and have the following comments based on a comparison to our ARMS ro-crates https://github.com/arms-mbon/data_release_001/blob/main/ro-crate-metadata.json and https://github.com/arms-mbon/analysis_release_001/blob/main/ro-crate-metadata.json

* add a Creator -> that could be embrc or could be metagoflow team

"creator": {"@id": "https://ror.org/0038zss60"}

I thought that using the ROR for the ID was correct (or at least acceptable); if not it can be changed.

* add a publisher -> vliz, as we did with arms data_release_001

This was already included:
"publisher": {"@id": "https://ror.org/0038zss60"}

Surely the publisher is EMBRC or EMO-BON rather than VLIZ?

* add wasAssociatedWith -> people involved in creating the dataset (e.g. the bioinformatian)

Do individuals have to be identified by name? I think it would be preferable to point to EMO BON and have a web-page where people involved are included. Else we need to keep track of who did what for every ro-crate, or we could just include the same named individuals in all manifests to simplify things.

* add a specific contactPoint -> is the [[email protected]](mailto:[email protected]) really the best email address?

It's "a" contact point - if there are alternatives we could replace it.

* the licence should be given as "license"

Ah, Americans...

* the associated sequence (that will be eventually the ENA run accession number) should get a predicate that say that it is an associatedSequence. We will need to find such a predicate: wasDerivedFrom or similar combined with a property saying that it is a sequence? @laurianvm ?

* the other files listed in the ro-create: where they were produced by metagotflow then need "wasInfluencedBy" and then refer to the GH repo of that file (or that instance of the file - need the version also). Also to discuss wtih @laurianvm but use as input the links given above

* need a downloadURL for each file

TODO.

* should have "keywords" for the repo

TODO.

* my arms rocrate has a "label" fo reach one, but here that is "name" - which is better? @cedricdcc ?

* at repo level add some dct:relation to link to other repos that should be considered with this one ...would need to think what they would be

* need to add material sample id (or uuid) I think as being a "sample" and will need laurian's input here as what the predicate and object would be

If you dont understand all of this @cymon (I have written in a bit of a hurry) then perhaps the best is that @laurianvm and I made a template and you fill it in?

cymon · 2024-10-17T11:10:24Z

@kmexter @laurianvm @marc-portier

I have a question regarding ro-crate stanza formatting: so this is a stanza from ARMS data_release_001

    {
        "@id": "./ARMS_ITS_Occurrence.csv",
        "@type": "File",
        "label": "./ARMS_ITS_Occurrence.csv",
        "fileFormat": "csv",
        "wasDerivedFrom": [
            "https://github.com/arms-mbon/data_workspace/tree/main/qualitycontrolled_data/combined",
            "https://github.com/arms-mbon/data_workspace/tree/main/analysis_data/from_pema/processing_batch1"
        ],
        "description": "The Occurrence extension for the ITS data",
        "downloadURL": "https://data.arms-mbon.org/data_release_001/latest/#./ARMS_ITS_Occurrence.csv"
    },

Here the "@id" points to a file at the location "./ARMS_ITS_Occurrence.csv" so I assume that the actual data file is included in the payload of the ro-crate. Yet, it also has a downloadURL that points to another copy of the same file that is in the payload. This is all good.

In the metaGOflow data products ro-crate (where this issue is attached) the entire ro-crate payload will consist of only the ro-crate.metadata.json manifest - no data files will be included. I therefore assumed that the "@id" field would be the URL to the data file and as such there would be no need for a "downloadURL" field. For example:

{
    "@id": "<the URL to the datafile in github/S3 or where ever>"
    "name": "ENA accession for run raw sequence data",
    "description": "FAKE: Raw sequence data and laboratory sequence generation metadata",
    "encodingFormat": "text/xml"
},

Is the "downloadURL" field redundant in the case, and can be left out, or should it be included even when the URL is identical to the "@id"?

kmexter · 2024-10-17T12:02:58Z

creator: I would add the full institute name, not only the id
publisher: ditto and no, the publisher is the one responsible for getting the data to be shared, so if people have question about that (e.g. "why is this file not accessible?") they know who to contact. But in fact, the email address for that should be that of the open science team, so the text to add is

"publisher": {
"@id": ":VLIZ"
},
where later
{
"@id": ":VLIZ",
"@type": "Organization",
"name": "Flanders Marine Institute",
"url": "https://www.vliz.be/en",
"label": "_:VLIZ",
"email":"[email protected]"
},

kmexter · 2024-10-17T12:06:21Z

wasAssociatedWith is to acknowledge people for their efforts (this ro-crate is in lieu of a metadata record) so if you want to acknowledge them, add them here by name; otherwise you don't have to bother (is up to you).
contactPoint: we should discuss this in the next opco. If someone emails help@embrc about mgf, I can imagine it will take ages for that request to eventually get to you. at least it should be the emobon email address, not the embrc one

kmexter · 2024-10-17T12:21:01Z

having said that about publisher above, I now change my mind
We agreed via Tosca that it would be https://www.embrc.eu/emo-bon
@laurianvm can you advise on how this should be written - I mean, this is a project (not an organisation) and it has a parent (EMBRC, being an organisation) and it has an email address ([email protected])

cymon · 2024-10-17T12:54:23Z

having said that about publisher above, I now change my mind We agreed via Tosca that it would be https://www.embrc.eu/emo-bon @laurianvm can you advise on how this should be written - I mean, this is a project (not an organisation) and it has a parent (EMBRC, being an organisation) and it has an email address ([email protected])

Can we have "publisher": ":EMBRC" and "creator": ":EMO BON" ?

We already have:
{
"@id": ":EMBRC",
"@type": "Organization",
"name": "European Marine Biological Resource Centre",
"url": "https://ror.org/0038zss60",
"contactPoint": {"@id": "mailto:[email protected]"}
},

We'd need a new "@type": for EMO BON.

kmexter · 2024-10-17T12:59:45Z

argh, so the definition of publisher is not EMBRC as they are not publishing they data, but one can say that the EMO BON project is publishing the data via its data managers, so publisher is emo bon, creator is emo bon, but @laurianvm can we have an owner and funder that is embrc?

cymon · 2024-10-17T13:25:55Z

wasAssociatedWith is to acknowledge people for their efforts (this ro-crate is in lieu of a metadata record) so if you want to acknowledge them, add them here by name; otherwise you don't have to bother (is up to you).

I think it would be simpler just acknowledge the EMO BON project where the persons involved should be detailed. If people feel strongly that each ro-crate should acknowledge a set of individuals involved the creation of the data, then the various roles that need to be acknowledged would need to be defined, and who those person were responsible for those roles in each each ro-crate would need to be recorded. Doable, but a big of a flaff.

Edit: I'm just going to assume no one feels strongly enough about this unless told otherwise.

* contactPoint: we should discuss this in the next opco. If someone emails help@embrc about mgf, I can imagine it will take ages for that request to eventually get to you. at least it should be the emobon email address, not the embrc one

An EMO BON email address would be better.

kmexter · 2024-10-18T06:31:56Z

use that then - [email protected]

kmexter · 2024-10-18T06:32:42Z

@kmexter @laurianvm @marc-portier

I have a question regarding ro-crate stanza formatting: so this is a stanza from ARMS data_release_001
    {
        "@id": "./ARMS_ITS_Occurrence.csv",
        "@type": "File",
        "label": "./ARMS_ITS_Occurrence.csv",
        "fileFormat": "csv",
        "wasDerivedFrom": [
            "https://github.com/arms-mbon/data_workspace/tree/main/qualitycontrolled_data/combined",
            "https://github.com/arms-mbon/data_workspace/tree/main/analysis_data/from_pema/processing_batch1"
        ],
        "description": "The Occurrence extension for the ITS data",
        "downloadURL": "https://data.arms-mbon.org/data_release_001/latest/#./ARMS_ITS_Occurrence.csv"
    },
Here the "@id" points to a file at the location "./ARMS_ITS_Occurrence.csv" so I assume that the actual data file is included in the payload of the ro-crate. Yet, it also has a downloadURL that points to another copy of the same file that is in the payload. This is all good.

In the metaGOflow data products ro-crate (where this issue is attached) the entire ro-crate payload will consist of only the ro-crate.metadata.json manifest - no data files will be included. I therefore assumed that the "@id" field would be the URL to the data file and as such there would be no need for a "downloadURL" field. For example:
{
    "@id": "<the URL to the datafile in github/S3 or where ever>"
    "name": "ENA accession for run raw sequence data",
    "description": "FAKE: Raw sequence data and laboratory sequence generation metadata",
    "encodingFormat": "text/xml"
},
Is the "downloadURL" field redundant in the case, and can be left out, or should it be included even when the URL is identical to the "@id"?

This is indeed a question for @laurianvm and @marc-portier

laurianvm · 2024-12-04T13:34:12Z

These are notes Laurian took down as we went over https://github.com/emo-bon/metaGOflow-data-products-RO-crate-example/blob/main/emo-bon-ro-crate-repository/EMOBON_BPNS_So_17-ro-crate/ro-crate-metadata.json

When something specific is required of Cymon, we will tag him

@laurianvm to check if spaces in at'id' values are allowed
@cymon to remove line 85 untill 96, replace with "email" property (contactPoint is a property used on Dataset, not an Organization)
@laurianvm to add predicates for accession numbers once created in ontology and then @kmexter and @laurianvm to tell Cymon how to add that. This is so we can refer to the sequence that was used in MFG
@laurianvm to include 'relation' property like used in here: https://github.com/emo-bon/observatory-bpns-crate/blob/main/extra-metadata.json#L23 - to refer to related repos, i.e. the logsheets. Then we will tell Cymon what to add to the ro-crate

kmexter assigned cymon, kmexter and laurianvm Oct 1, 2024

kmexter mentioned this issue Oct 22, 2024

RO-Crates: crediting institutes and persons by name #4

Open

kmexter mentioned this issue Dec 4, 2024

Dec 2024 issue on the mfg ro-crates and improvements #7

Open

emo-bon deleted a comment from laurianvm Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ro-crate #2

ro-crate #2

kmexter commented Oct 1, 2024 •

edited

Loading

laurianvm commented Oct 16, 2024 •

edited

Loading

cymon commented Oct 16, 2024

cymon commented Oct 17, 2024

kmexter commented Oct 17, 2024

kmexter commented Oct 17, 2024

kmexter commented Oct 17, 2024

cymon commented Oct 17, 2024

kmexter commented Oct 17, 2024 •

edited

Loading

cymon commented Oct 17, 2024 •

edited

Loading

kmexter commented Oct 18, 2024

kmexter commented Oct 18, 2024

laurianvm commented Dec 4, 2024 •

edited by kmexter

Loading

ro-crate #2

ro-crate #2

Comments

kmexter commented Oct 1, 2024 • edited Loading

laurianvm commented Oct 16, 2024 • edited Loading

cymon commented Oct 16, 2024

cymon commented Oct 17, 2024

kmexter commented Oct 17, 2024

kmexter commented Oct 17, 2024

kmexter commented Oct 17, 2024

cymon commented Oct 17, 2024

kmexter commented Oct 17, 2024 • edited Loading

cymon commented Oct 17, 2024 • edited Loading

kmexter commented Oct 18, 2024

kmexter commented Oct 18, 2024

laurianvm commented Dec 4, 2024 • edited by kmexter Loading

kmexter commented Oct 1, 2024 •

edited

Loading

laurianvm commented Oct 16, 2024 •

edited

Loading

kmexter commented Oct 17, 2024 •

edited

Loading

cymon commented Oct 17, 2024 •

edited

Loading

laurianvm commented Dec 4, 2024 •

edited by kmexter

Loading