Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Better reading of RO-Crate metadata files #542

Open
kmexter opened this issue Nov 5, 2024 · 9 comments
Open

[Feature]: Better reading of RO-Crate metadata files #542

kmexter opened this issue Nov 5, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@kmexter
Copy link

kmexter commented Nov 5, 2024

Detailed Description

I recently had a meeting with @huberrob and colleagues, in the context of the FAIR-EASE project, to discuss our use of the FUJI fairness checker on our metadata "records" which are actually ro-crate metadata json files. I ran the some files (see list below) through the checker and from a look at the report I got back, they are perhaps not quite read properly by FUJI. I mentioned this at the meeting and Robert asked me to provide these ro-crate examples for FUJI to check out

https://github.com/arms-mbon/data_release_001/blob/main/ro-crate-metadata.json
https://github.com/arms-mbon/analysis_release_001/blob/main/ro-crate-metadata.json
https://github.com/arms-mbon/code_release_001/blob/main/ro-crate-metadata.json

We made quite some effort to add lots of machine-understandable metadata to these rocrates, so I think they should give a good assessment. So we have identifiers, download URLs, licence, links to related entities, etc.
FYI The domain of these metadata is marine biology.

I know that there are 2 ways that I can specify the URL in the tool in order to get the best reporting, i.e.
https://raw.githubusercontent.com/arms-mbon/data_release_001/main/ro-crate-metadata.json vs https://github.com/arms-mbon/data_release_001/blob/main/ro-crate-metadata.json
It would be nice to have a recommendation as to which is the better approach (since the first one gave a better result, I am guessing it is that one)

Context

If you can modify FUJI to work better on ro-crates, we could trust the results better. We would also value any feedback from you on our ro-crates!

Possible Implementation

@huberrob
Copy link
Contributor

huberrob commented Nov 7, 2024

Just a quick first comment: only the https://raw.githubusercontent.com style URLs are pointing to JSON content, URLs without the 'raw' subdomain are delivering HTML content in which JSON is displayed in a textarea tag, therefore F-UJI would not find it.

@mpo-vliz
Copy link

mpo-vliz commented Nov 7, 2024

I know that there are 2 ways that I can specify the URL in the tool in order to get the best reporting, i.e.
https://raw.githubusercontent.com/arms-mbon/data_release_001/main/ro-crate-metadata.json vs https://github.com/arms-mbon/data_release_001/blob/main/ro-crate-metadata.json
It would be nice to have a recommendation as to which is the better approach (since the first one gave a better result, I am guessing it is that one)

There is a third way, well rather the imho "correct" way

My own recommendation is not to use either of those refs — we need to stop using github-based uri in these case, mostly because they are accidental to our current located host, and entirely not core to these data 'identity', definitely not a dependency we want to be tied to. We did an effort to organize the data.arms-mbon.org domain -- and so the correct way to publicly refer to that ro-crate is https://data.arms-mbon.org/data_release_001/

From there possible redirects + a finel html-embedded (link;rel=descibedby) fair-signposting-conform, should lead one to the metadata,json file in question.

@cedricdcc
Copy link

quick sidenote. The page that you are describing here https://data.arms-mbon.org/data_release_001/ is a collection of versions of a rocrate. These don't have LOD yet but will have that in the future. The page that has embedded fair-signposting is https://data.arms-mbon.org/data_release_001/latest/ which is the latest version of the data_release_001 rocrate.

image

@huberrob
Copy link
Contributor

huberrob commented Nov 8, 2024

True, at least F-UJI would follow the describedby links to locate the metadata.

@huberrob
Copy link
Contributor

huberrob commented Nov 8, 2024

Maybe you could replace the fileFormat values with correct mime types?
Not sure how this is handles in RO-Ctate namespace butaAs far as I know fileFormat is replaced in schema.org by encodingFormat

@mpo-vliz
Copy link

mpo-vliz commented Nov 8, 2024

I see the rocrate context has provisioning for both encodingFormat and fileFormat

And the spec recommendation is to use encodingFormat to hold the mime types
https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/data-entities.html#encoding-file-paths

@kmexter
Copy link
Author

kmexter commented Dec 9, 2024

FYI We are upgrading our ro-crates that I gave as examples - will ping you when we are doing (will not be until next year).
From comments on #543 I wonder if also here there is an issue with the fact that a ro-crate can describe multiple datasets (so effectively has multiple download URLs). FYI however, if a ro-crate is filled properly then it is, in fact, possible to link each metadata element to named data elements. An alternative approach is to consider the entire ro-crate as being the jsonld metadata record of a single dataset, albeit one with many elements.
Anyway, thoughts? Should I continue feeding this issue, or is this enough?

@huberrob
Copy link
Contributor

huberrob commented Dec 9, 2024

OK sounds great, it should be no problem to pack several Datasets in one ro-crate, but F-UJI will evaluate the root(the crate which is of type CreativeWork) not each individual Dataset. Looking forward to see these examples!

@kmexter
Copy link
Author

kmexter commented Dec 9, 2024

so it will fail all tests related to metadata provided for the dataset itself, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

4 participants