Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicators for R1.1: (meta)data are released with a clear and accessible data usage licence #27

Closed
makxdekkers opened this issue Jun 24, 2019 · 20 comments

Comments

@makxdekkers
Copy link

image

@makxdekkers
Copy link
Author

Points raised in online meeting 3 on 18 June 2019

  • The licence should be easily located in the metadata.

@keithjeffery
Copy link

I suggest this is insufficient. A licence is most likely human readable and not machine understandable. For autonomic interoprability it is necessary to extract from the licence assertions or rules in logic that can be used to determine whether at this time from this place this software acting on behalf of this user from this organisation in this role can access (and conditionally perform other operations on) the asset and if so what is recorded about the access (e.g. citation, accreditation, audit logging, provenance, curation).

@makxdekkers
Copy link
Author

@keithjeffery These are indeed important aspects of a licence. However, I think we also need to be realistic. I would say that possibly very few existing licences in use for research data provide this level of detail. There is some of this in ccRel and in ODRL but I don't know how widespread these are. Are there detailed descriptions of permissions and obligations for commonly used licences (e.g. CC-BY) publicly available?

Would including these very detailed requirements in an indicator not make it too difficult for any data to be considered FAIR? I guess we don't want to end up in a situation that someone says "my data is CC-BY" and the evaluation concludes that the data, because of that, is not FAIR. It would make FAIRness hard to achieve.

Could we enumerate a smaller set of crucial licence information, plus useful, but not mandatory extensions?

@keithjeffery
Copy link

keithjeffery commented Jul 1, 2019 via email

@makxdekkers
Copy link
Author

@keithjeffery OK, let's then try to get some more opinions and suggestions from others in the WG.

@micheldumontier
Copy link

you might be interested in the indicators outlined here: http://reusabledata.org/

@micheldumontier
Copy link

as a general comment, we expect that maximum FAIRness is achieved when machines can interpret the terms and conditions in a license (see also smart contract)

@makxdekkers
Copy link
Author

@micheldumontier Do you think it is possible to require existing licences (e.g. CC-BY) to be fully machine-understandable, e.g. with explicit permissions, prohibitions, obligations etc., the way ODRL does? If so, could the third indicator above be reformulated as:

Machine-understandable licence

  • NO machine-understandable licence
  • Machine-understandable licence with explicit expression of permissions, prohibitions, obligations etc. (e.g. using ODRL policy)

@micheldumontier
Copy link

@makxdekkers I think that this is highly desirable, yes.

@markwilkinson
Copy link

I would make this a tiered-metric (as we have done for several of the Maturity Indicators in our project). The idea of having a "license - weak compliance" metric, that says "can a machine FIND the license, regardless of what it's value is"; and then a second metric "license - strong compliance", that says "can a machine process the license that it finds". At least for the moment, being able to FIND the license in most metadata records is already a struggle, because existing standards are not well-harmonized around this property. Looking at the LOV registry, I see ~13 distinct predicates that could be interpreted as pointing at some kind of license. It would be nice if that could be pruned down to a small handful. Then, to look at machine-readability of the value of those predicates (where even "readability" is tricky - a lot of the CC licenses have an RDF presentation, that reflects the common structural components of a license, but the content is not processable by machines...)

@makxdekkers
Copy link
Author

@markwilkinson The indicators in the first comment in this issue already have the presence of a licence as an indicator. The additional one that I proposed after @micheldumontier' comment is about the machine-understandability of the licence in terms of permissions, obligations etc.
Do you agree that those two (presence of licence and machine-understandable licence) coveR the tiered metric you describe? Or would you want to include a recommendation of the specific metadata element where the licence information is to be provided?

@markwilkinson
Copy link

Yes, I want a specific metadata element. Without that, I can't find the "thing" that is supposed to represent the license. (unless we go full-on OWL, and then I can check the rdf:type of every value of every metadata element to figure out which one is of type "License" ;-) )

@makxdekkers
Copy link
Author

@markwilkinson But how can a specific metadata element be mandated? Where licence information is provided depends on the community standard, doesn't it? Community standards usually have a clear place to provide a link to a licence (DC/DCAT has dct:license, schema.org has schema:license, DataCite metadata schema has a Rights element etc.). Requiring the use of a specific metadata element might then be in conflict with the community standard.

@markwilkinson
Copy link

Oh! No, I meant exactly that - that there should BE such an element, formally designated as such by that community, rather than just a randomly coined predicate.

@makxdekkers
Copy link
Author

@markwilkinson That's a relief!
So an indicator could be:

R1.1-05 Provision of licence information in the appropriate element in the metadata standard used

  • Licence information NOT provided in the appropriate element
  • Licence information provided in the appropriate element

On the other hand, isn't that a quality issue?

In a way, if metadata does not use the appropriate metadata element, it would also fail on R1.3 as it would not (correctly) follow the relevant community standard.

@keithjeffery
Copy link

@makx -
This 1.1-05 is fine as one criterion. Now we 'have it in the appropriate element' (so it can be queried accurately) the next step is to ensure the element is machine-understandable (formal syntax, dclared semantics) i.e. machine readable ==> machine understandable
best
Keith

@makxdekkers
Copy link
Author

makxdekkers commented Jul 11, 2019

@keithjeffery Are you proposing an additional indicator? E.g.:

R1.1-06 Provision of machine-understandable licence information

@keithjeffery
Copy link

keithjeffery commented Jul 11, 2019 via email

@bahimc
Copy link
Collaborator

bahimc commented Aug 2, 2019

Please find the current version of the indicator(s) and their respective maturity levels for this FAIR principle. Indicators and maturity levels will be presented, as they stand, to the next working group meeting for approval. In the meantime, any comments are still welcomed.

The editorial team will now concentrate on weighing and prioritizing these indicators. More information soon.

image

image

@bahimc
Copy link
Collaborator

bahimc commented Oct 7, 2019

Dear contributors,

Below you can find the indicators and their maturity levels in their current state as a result of the above discussions and workshops.

image
image

Please note that this thread is going to be closed, within a short period of time. The current state of the indicators, as of early October 2019, is now frozen, with the exception of the indicators for the principles that are concerned with ‘richness’ of metadata (F2 and R1). The current indicators will be used for the further steps of this WG, which are prioritisation and scoring. Later on, they will be used in a testing phase where owners of evaluation approaches are going to be invited to compare their approaches (questionnaires, tools) against the indicators. The editorial team, in consultation with the Working Group, will define the best approach to test the indicators and evaluate their soundness. As such, the current set of indicators can be seen as an ‘alpha version’. In the first half of 2020, the indicators may be revised and improved, based on the results of the testing. If you have any further comments, suggestions regarding that specific discussion, please share them with us. Besides, we invite you to have a look at the following two sets of issues.

Prioritisation

• Indicators prioritisation for Findability
• Indicators prioritisation for Accessibility
• Indicators prioritisation for Interoperability
• Indicators prioritisation for Reusability

Scoring

• Indicators for FAIRness | Scoring

We thank you for your valuable input!

@bahimc bahimc closed this as completed Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants