Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicators for FAIRness | Scoring #34

Open
bahimc opened this issue Sep 23, 2019 · 35 comments
Open

Indicators for FAIRness | Scoring #34

bahimc opened this issue Sep 23, 2019 · 35 comments

Comments

@bahimc
Copy link
Collaborator

bahimc commented Sep 23, 2019

As presented to you during our last workshop, the editorial team has explored the concept of assessing the implementation level of the FAIR data principles.

This concept relies on the core criteria - indicators and their maturity levels - we have been developing since June. (meta)data satisfying or not satisfying the core criteria will allow to evaluate a digital object in order to answer the question “How the FAIRness of this data can be improved”.

Picture11

  • Level 0 : The resource did not comply will all the mandatory indicators
  • Level 1 The resource did comply with all the mandatory indicators, and less than half of the recommended indicators
  • Level 2 The resource did comply with all the mandatory indicators and at least half of the recommended indicators
  • Level 3 The resource did comply with all the mandatory and recommended indicators, and less than half of the optional indicators
  • Level 4 The resource did comply with all the mandatory and recommended indicators and at least half of the optional indicators
  • Level 5 The resource did comply with all the mandatory, recommended and optional indicators

As many times mentioned, the FAIR principles are aspirational, FAIR is a journey. It is complex to measure, particularly through time, how FAIR a digital resource can be. But rather, the result of such an evaluation should be improvement areas. It is important to stress that, as put forth by the charter, it needs to be possible to compare the results of different evaluation approaches (questionnaires or automated tools).

The aim of this evaluation is NOT to pose a judgement but instead objectively score a resource to identify ameliorations. We will nonetheless avoid discussions about the scoring visualization as this is the responsibility of owners of methodologies.

With the aim of developing the best evaluation/scoring mechanism we encourage you to share any feedback below.

@rwwh
Copy link

rwwh commented Oct 3, 2019

This needs to be "future proof".
It is important to see how any score should be seen over time. The currently identified indicators are surely not a definitive list: new ones may appear in the future. Priorities may change when FAIR supporting technology progresses. And even if the indicators may be fixed, a new community standard that is developed in the future could change the score for an existing data resource.

I think there are two options, both with their disadvantages:

  • Do all datasets need to be re-scored regularly? Do we accept that a Level 3 resource can sink to Level 2 over time? This requires regular re-evaluation of every resource, which can only be scalable if the process is fully automated.

  • Or will every score be a "snapshot"? That puts a burden on the one using the score: if it was evaluated 10 years ago, maybe a "level 3" is not as FAIR any more as current standards would demand.

@rwwh
Copy link

rwwh commented Oct 3, 2019

The disadvantage of any "step/star/level" system like this is that once a level has just been obtained, there is very little incentive to make any more effort towards the next level, unless there is any prospect of making the next level.

Another potential weakness is that compliance with "recommended" indicators is not counting as long as there is even one mandatory indicator missing. This conflicts a bit with my usual story that people have to evaluate for each FAIR principle whether the efforts weigh up to the benefits. I would find it really a pity if a relatively simple FAIRification effort is not undertaken because "it won't give me a better score anyway".

A weird alternative proposal: give the score as a 3-number tuple, indicating the percentage of mandatory, recommended, and optional indicators met, maximized to 99.99.99.

@rwwh
Copy link

rwwh commented Oct 3, 2019

The proposed system does not separately identify F, A, I, and R. compliance.

@makxdekkers
Copy link

@rwwh

It is important to see how any score should be seen over time. The currently identified indicators are surely not a definitive list: new ones may appear in the future. Priorities may change when FAIR supporting technology progresses. And even if the indicators may be fixed, a new community standard that is developed in the future could change the score for an existing data resource.

I think it is indeed important to consider that indicators may change over time. A 'score' needs to be related to the set of indicators at the time of scoring, also, as you note, because there could be changes in the environment (standards, technologies) that need to be taken into account.

@makxdekkers
Copy link

@rwwh

The disadvantage of any "step/star/level" system like this is that once a level has just been obtained, there is very little incentive to make any more effort towards the next level, unless there is any prospect of making the next level.

It seems to me that this is not a characteristic of the scoring, but more a policy issue. If people do not want to take extra steps, they might not be interested in FAIR in any case. I would think that the scoring helps people to understand where they can improve -- whether or not they want to improve is another matter. It would be important for people to understand that FAIRness is not a goal, so it's not about getting a higher score, but a means to an end, namely enabling reuse.

@makxdekkers
Copy link

@rwwh

A weird alternative proposal: give the score as a 3-number tuple, indicating the percentage of mandatory, recommended, and optional indicators met, maximized to 99.99.99.

Yes, that could be a good alternative.

@makxdekkers
Copy link

The proposed system does not separately identify F, A, I, and R. compliance.

Are you suggesting that we create scores like

F: 50.75.20
A: 100.33.75
I: 75.20.33
R: 80.100.25

Does that take away some of your concerns about the proposed scoring above being too crude?

@rwwh
Copy link

rwwh commented Oct 13, 2019

It passed my brain, but 12 scores may be a bit much. It is also no longer a visual summary, but more like a table that needs intention to read.

@makxdekkers
Copy link

@rwwh The 12 scores were based on your earlier proposal for three values and then separately for the four areas F, A, I and R. How would you suggest to do it differently?

@rwwh
Copy link

rwwh commented Oct 14, 2019

It is a hard problem.... One way to do it is to "fold" in the two directions: give a single triple, and try to give a F,A,I,R profile separately. That still makes 7 levels.

@makxdekkers
Copy link

@rwwh
Are your suggesting to have

  1. an overall FAIRness score in three numbers, e.g. 100, 40, 75 (mandatory, recommended, optional) plus
  2. an average score for the areas, e.g. if for F you have 100, 60, 40, the score for F would be either
    (a) 67, the average of the scores, or
    (b) level 3 as in the table above?

It has the advantage that there are less numbers but it also makes it less clear where improvements could be made.

@keithjeffery
Copy link

@makxdekkers @rwwh
Apologies for not commenting earlier. I took time to think about the priorities because it is easy for such indicator values to have unintended consequences.
Generally I agree with Rob and like the 3 percentages formula.
While separate scores for F,A,I,R are an advantage the complexity is increased.
I think we agreed earlier that there is a sort of progression: R is not possible without I, I not possible wthout A etc. In this case could we find a composite final indicator where the 'contribution' of F, A, I to R is somehow factored in?
For example, Mandatory R would involve mandatory F,A,I. Recommended R would involve mandatory F, Mandatory A, recommended I
and so on.
This would (a) simplify things in presentation (but perhaps not in evaluation/scoring); (b) provide encouragement to progress
Just my 2 cents worth
Keith

@makxdekkers
Copy link

@keithjeffery Can you explain what you mean by "Mandatory R"; we're trying to put mandatory, recommended and optional on the indicators, not on the FAIR areas.

@rwwh
Copy link

rwwh commented Oct 16, 2019

@keithjeffery Although I certainly agree that there is a progression, the principles and thereby the indicators are quite orthogonal, and even if F indicators have not been met, an additional R indicator can be important.
@makxdekkers yes, 2(b) would have my preference over (a).

@keithjeffery
Copy link

@makxdekkers apologies for my lazy 'shorthand'. I meant to indicate that sufficient indicators (sufficient to be defined) of the kind Mandatory, Recommended were considered achieved within each of the FAIR groups of principles.
@rwwh Can you give an example of an acceptable level of rich metadata for a R indicator that is less (i.e. fewer attributes, less formal syntax, less defined semantics) than that required for a F indicator? From my (admittedly insufficient) experience metadata acceptable for F is a subset of that for R (or A, I).

@rwwh
Copy link

rwwh commented Oct 16, 2019

@keithjeffery I consider the metadata for Findability to be (mostly?) disjoint from the metadata for Reusability. Findability requires a good classification of what exactly IS in the data. This is no longer necessary once someone has decided to reuse the data; at that point they only need the R-metadata (i.e. how it was obtained, what the license conditions are, ...). See also: http://www.hooft.net/en/people/rob/events/193-tell-me-what-it-is-not-how-you-use-it

@keithjeffery
Copy link

@rwwh I see your point but my experience in environmental science is different. For example common metadata attributes used for F are spatial and temporal coordinates (e.g. restriction of area and date range for earthquakes or volcanic events). These are required again in R (for map overlays for example) together with rights/permissions and much more(which presumably also are needed first in A). As users become more sophisticated in their use of Discovery (F) they use quite detailed metadata to 'cut down' to the real digital assets they need (contexualisation, A) anticipating what they will be doing (e.g. producing maps, executing simulations with complex parameters) under I (to massage the digital assets into a form where they can be used together) and R (where they are used together).
It would be good to consider this in several different domains to see if there is a common pattern (progression) of required 'richness' of metadata through F,A,I,R
@makxdekkers apologies that this is slightly off topic but I believe it is iteresting!

@makxdekkers
Copy link

@keithjeffery Indeed, interesting and not altogether off-topic. Please note that in the indicators for F2 and R1, we do make it explicit that the metadata for F2 is about discovery and the metadata for R1 is about reuse. Now, what that means in practice is open for discussion, and I hope that the joint metadata meeting in Helsinki can shed some light on this.

@sjskhalsa
Copy link

@keithjeffery - from the perspective of a researcher there is a large overlap of metadata supporting F and R. I may start a search based on topic/measurable, space and time, but would then refine based on metadata informing fitness for use (e.g. percent cloud cover if I was looking for imagery) and, of course, accessibility (what is the cost?). I suspect many use cases would need to be analyzed to determine whether a common pattern of progression through levels of metadata richness could be discerned.

@rwwh
Copy link

rwwh commented Oct 16, 2019

@keithjeffery Is this difference between geo and life sciences caused by the fact that volcanology data can only ever be used for volcanology? As an "outsider" I would say seismology data would be useful in more than one subfield of geosciences, and then I could imagine that something like the frequency filter characteristics could be a piece of "findability" metadata.

With another risk of going off-topic: reusability keeps surprising me. I have been convinced that a researcher is the worst person in the world to judge the reusability of their own data, because they are biased towards their own view. To try and convey some of my surprise: I heard once about researchers re-using interviews that were recorded for the study of dying languages for the study of room acoustics .... this would be helped by carefully crafted Findability metadata absolutely irrelevant for the reuse in the original science field ..... and is certainly not helped by a careful transcription of the conversation.

I am going to sleep another night on this. Maybe you are right that Findability metadata is always a subset of Reusability metadata. I would just place that subset exclusively under F, thereby creating a clear orthogonality between the letters.

@keithjeffery
Copy link

@ sjskhalsa I think we are agreeing! However, I suspect we also agree the analysis of multiple use cases would be costly.

@rwwh Actually volcanology data is highly reusable: inorganic chemists, gas chemists and physicists, atmospheric physicists, meteorologists, through to geothermal energy industry, civilian authorities engaged with anthropogenic hazard and, of course, air traffic control (remember Iceland a few years back and, in the last few days, Etna).

The key thing about your re-use example is that the richer the metadata in F,A, I the easier it is for the re-using researcher to assess if re-use is possible. I have a similar example from my PhD days (sixties); I developed software fitting a sinusoidal wave to geoscience data but the software was used much more by some guys working in optics. The metadata associated with my software was apparently sufficent for them to conclude it could be re-used for their purpose.

I continue to believe that for activity under R it is necessary to have available (or have used in a previous step in the workflow) rich metadata covering not only F but also A (e.g. rights, licences) and I (convertors available, appropriate formats).

@makxdekkers
Copy link

@rwwh

You proposed:

One way to do it is to "fold" in the two directions: give a single triple, and try to give a F,A,I,R profile separately.

Based on that I suggested:

  1. an overall FAIRness score in three numbers, e.g. 100, 40, 75 (mandatory, recommended, optional)

I think you and @keithjeffery agreed with that approach.

Wouldn't this create the risk that someone looks at the overall score and decides, if the score for mandatory is not 100, "oh, this resource is not FAIR"? In earlier discussions, people thought it would not be a good idea to have an overall score because it is too crude in comparison to scores per principle or per area.

@rwwh
Copy link

rwwh commented Oct 18, 2019

@makxdekkers I thought to suggest a combination of your approach (1) and your approach 2(b) for at total of 7 "numbers".
We will not be able to completely eradicate people saying "this resource is FAIR" or "that resource is not FAIR", but I've said often enough that I don't want anyone to use a binary assignment. ;-)

@makxdekkers
Copy link

@rwwh Yes, I understood that you proposed the combination of 1 and 2b. I was just noting that the triple under 1 could be misinterpreted as "the" FAIR evaluation result if people were searching for an easy conclusion, and then would not look at the per-area assessments.
Maybe that can be avoided if evaluators never quote the triple without the per-area results.

@markwilkinson
Copy link

In my opinion, an overall FAIRness score is not a useful measure, for many of the reasons mentioned above, but in addition...

In the paper on the FAIR Evaluator, we suggested to approach FAIRness testing from the perspective of contracts between a data provider, and a data consumer - promises that both humans and machine users can rely on as they attempt increasingly complex data transactions with a provider. From this perspective, a "FAIR score" is a totally meaningless artefact, since it tells me nothing about the behaviors of that provider that I can code my agent to expect.

@makxdekkers
Copy link

@markwilkinson
Yes, in the perspective you depict, I agree that a score doesn't tell you much.
However, there are potential users of the indicators, for example funding agencies, who may want to verify that the data produced by projects they fund complies with a certain level of FAIRness. Such a "FAIRness score" should be accompanied by the more detailed observations on the indicators, so an evaluator can see which indicators have been satisfied and which have not.

@rduerr
Copy link

rduerr commented Feb 13, 2020

@makxdekkers Uff...! The example you just gave leads me to think about the current practices of publication metrics which are indeed used in just such ways, much to science's detriment in my opinion. Do you really want to create such an environment?

@markwilkinson
Copy link

Indeed.... in fact, I think we should be DISCOURAGING the idea of "FAIR Scores"! (as we do in the Evaluator manuscript)

@rwwh
Copy link

rwwh commented Feb 14, 2020

The best way to use maturity indicators for me is as a checklist in order to become more mature.... to perform the cost/benefit analysis I mentioned in the call yesterday: do what makes sense, stop before diminishing returns hit. The result is strongly dependent on the kind of data and the environment.

@rduerr
Copy link

rduerr commented Feb 14, 2020

@rwwh I tend to agree with that; but suspect that the inventors of the publication metrics felt exactly the same way. You would have to come up with mechanisms to prevent similar occurrences (and I am not convinced there are any).

@makxdekkers
Copy link

@markwilkinson @rwwh @rduerr
We'll abandon the idea of an overall score. There is consensus in the working group that it is more important to show how well (meta)data meets the requirements in the principles in order to help people improve FAIRness.

@ghost
Copy link

ghost commented Jun 23, 2020

Hope you forgive me to revive this thread. Just appeared to my mind whether it would be helpful to visualize "improvement opportunities"?

I.e. "a lot of indicator metrics not fulfilled yet" -> "big improvement opportunity" -> "big arrow" (just for example an "arrow", could be anything)?

What do you think? Best, Robert

@bahimc
Copy link
Collaborator Author

bahimc commented Jun 24, 2020

@robertgiessmann thanks for your comment. Regarding the improvement opportunities, radar charts have been proposed to visualise the efforts needed to increase FAIRness. The evaluation method prototype can be accessed here.

@ghost
Copy link

ghost commented Jun 24, 2020

Hello there! (sorry, @RDA-FAIR, not sure whom I'm speaking to behind this account),
I think that a phrasing like "efforts needed to increase FAIRness" might be incepting that "guilt feeling" in people; I hoped that "improvement opportunity" (similar to "low hanging fruit" or "quick wins") might sound more attractive? FAIRification should be considered something "nice", something "where you gain something from", no? Of course all that words are obvious Newspeak, solely, but we might give it a try? Best, Robert

@cbahim
Copy link
Collaborator

cbahim commented Jun 24, 2020

@robertgiessmann, this is Christophe Bahim, member of the editorial team supporting the FAIR data maturity model. Indeed, as spelled out in the proposed recommendation, the evaluation method was designed not as a value judgment but rather as guidance, where all communities will remain involved. The wording used is reflective of that. If you think of concrete improvements, please feel free to share them. Otherwise, improving the evaluation method is in the agenda for the FAIR data maturity model maintenance group. Don't hesitate to bring this up during one of our next webinars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants