request for comment: rate dataset descriptions based on presence of properties #683

coret · 2023-02-21T17:01:32Z

Rating

To stimulate dataset providers to improve their dataset descriptions in terms of completeness of properties, a rating system is proposed.

The dataset descriptions are rated with 1 to 5 stars, depending on the content (presence of properties) of the dataset description:

Each dataset description that has the required license, title and publisher gets a ☆ rating
If a dataset description also has a description and distribution, then the dataset description receives a ☆☆ rating
If a dataset description also has a creator and landingPage, the dataset description will receive a ☆☆☆ rating
If a dataset description also has a created, modified/updated and/or issued/published date, the dataset description will receive a ☆☆☆☆ rating
If a dataset description also has a language, source, keyword, spatial and/or temporal, the dataset description will receive a ☆☆☆☆☆ rating

This method does not (yet):

promote multi-language content
evaluate the quality of the content (eg. is the description understandable, does the contentURL of the distribution exist, is it linked data?), the method just evaluates based on quantity
not all schema:Dataset properties as defined in Requirements for datasets are evaluated, nor are the schema:DataDownload (distribution) properties

The rating for each of the dataset properties (both schema.org and DCAT):

Schema.org	DCAT	☆	☆☆	☆☆☆	☆☆☆☆	☆☆☆☆☆
schema:license	dct:license	must
schema:name	dct:title	must
schema:publisher	dct:publisher	must
schema:description	dct:description		must
schema:distribution	dct:distribution		must
schema:creator	dct:creator			must
schema:mainEntityOfPage	dct:landingPage			must
schema:dateCreated	dct:created				one-of
schema:dateModified	dct:modified				one-of
schema:datePublished	dct:issued				one-of
schema:inLanguage	dct:language					one-of
schema:isBasedOnUrl	dct:source					one-of
schema:keywords	dct:keyword					one-of
schema:spatialCoverage	dct:spatial					one-of
schema:temporalCoverage	dct:temporal					one-of
schema:citation	dct:isReferencedBy
schema:genre	dct:type
schema:version	dct:hasVersion
schema:includedInDataCatalog	dct:isPartOf

Construction

The rating of a dataset description in stored in a separate graph https://data.netwerkdigitaalerfgoed.nl/registry/description_ratings with the property schema:contentRating.

The graph is constucted using 5 SPARQL INSERT queries:

☆☆☆☆☆ rating

PREFIX schema: <http://schema.org/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
INSERT {
    GRAPH <https://data.netwerkdigitaalerfgoed.nl/registry/description_ratings> {
        ?dataset schema:contentRating "☆☆☆☆☆"
    }
} WHERE {
    ?dataset a dcat:Dataset ;
    dct:description ?o1 ;
    dcat:distribution ?o2 ;
    dct:creator ?o3 ;
    dcat:landingPage ?o4 .
    { ?dataset dct:created ?o5 . }
    UNION
    { ?dataset dct:modified ?o6 . }
    UNION
    { ?dataset dct:issued ?o7 . }
    UNION
    { ?dataset dct:language ?o8 . }
    UNION
    { ?dataset dct:source ?o9 . }
    UNION
    { ?dataset dcat:keyword ?o10 . }    
}

☆☆☆☆ rating

PREFIX schema: <http://schema.org/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
INSERT {
    GRAPH <https://data.netwerkdigitaalerfgoed.nl/registry/description_ratings> {
        ?dataset schema:contentRating "☆☆☆☆"
    }
} WHERE {
    ?dataset a dcat:Dataset ;
             dct:description ?o1 ;
             dcat:distribution ?o2 ;
             dct:creator ?o3 ;
             dcat:landingPage ?o4 .
    { ?dataset dct:created ?o5 . }
    UNION
    { ?dataset dct:modified ?o6 . }
    UNION
    { ?dataset dct:issued ?o7 . }
    FILTER NOT EXISTS { ?dataset dct:language ?o8 . }
    FILTER NOT EXISTS { ?dataset dct:source ?o9 . }
    FILTER NOT EXISTS { ?dataset dcat:keyword ?o10 . }
    FILTER NOT EXISTS { ?dataset schema:contentRating ?o11 . }
}

☆☆☆ rating

PREFIX schema: <http://schema.org/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
INSERT {
    GRAPH <https://data.netwerkdigitaalerfgoed.nl/registry/description_ratings> {
        ?dataset schema:contentRating "☆☆☆"
    }
} WHERE {
    ?dataset a dcat:Dataset ;
             dct:description ?o1 ;
             dcat:distribution ?o2 ;
             dct:creator ?o3 ;
             dcat:landingPage ?o4 .
    FILTER NOT EXISTS { ?dataset dct:created ?o5 . }
    FILTER NOT EXISTS { ?dataset dct:modified ?o6 . }
    FILTER NOT EXISTS { ?dataset dct:issued ?o7 . }
    FILTER NOT EXISTS { ?dataset schema:contentRating ?o8 . }
}

☆☆ rating

PREFIX schema: <http://schema.org/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
INSERT {
    GRAPH <https://data.netwerkdigitaalerfgoed.nl/registry/description_ratings> {
        ?dataset schema:contentRating "☆☆"
    }
} WHERE {
    ?dataset a dcat:Dataset ;
             dct:description ?o1 ;
             dcat:distribution ?o2 .
    FILTER NOT EXISTS {
        ?dataset dct:creator ?o3 .
        ?dataset dcat:landingPage ?o4 .
    }
    FILTER NOT EXISTS { ?dataset schema:contentRating ?o5 . }
}

☆ rating

PREFIX schema: <http://schema.org/>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
INSERT {
    GRAPH <https://data.netwerkdigitaalerfgoed.nl/registry/description_ratings> {
        ?dataset schema:contentRating "☆"
    }
} WHERE {
    ?dataset a dcat:Dataset ;
             dct:license ?o1 ;
             dct:title ?o2 ;
             dct:publisher ?o3 .
    FILTER NOT EXISTS {
        ?dataset dct:description ?o4 .
        ?dataset dcat:distribution ?o5 .
    }
    FILTER NOT EXISTS { ?dataset schema:contentRating ?o6 . }
}

TODO

check for completeness (do all datasets have a rating?)
determine how/when to calculate ratings (eg. remove graph and execute insert queries above)

Selection

The rating can be used for sorting and to show the rating (in the demonstrator), with a query like:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?dataset ?title ?publisherName ?rating WHERE {
    ?dataset a dcat:Dataset ;
             dct:title ?title ;
             dct:publisher ?publisher .
    ?publisher foaf:name ?publisherName .
    OPTIONAL {
        ?dataset schema:contentRating ?rating
    }
    FILTER(LANG(?title) = "" || LANGMATCHES(LANG(?title), "nl"))
    FILTER(LANG(?publisherName) = "" || LANGMATCHES(LANG(?publisherName), "nl")) 
    FILTER CONTAINS(LCASE(?title),"archief") .
} ORDER BY DESC(?rating) ?title

The following aggregation query shows the number of datasets with a specific number of stars:

PREFIX schema: <http://schema.org/>
SELECT ?rating (COUNT(*) AS ?datasets_with_rating) WHERE { 
	?dataset schema:contentRating ?rating .
} GROUP BY ?rating

Output on 21-2-2023:

rating	datasets_with_rating
"☆"	"279"^^xsd:integer
"☆☆"	"376"^^xsd:integer
"☆☆☆"	"11"^^xsd:integer
"☆☆☆☆"	"9"^^xsd:integer
"☆☆☆☆☆"	"111"^^xsd:integer

The text was updated successfully, but these errors were encountered:

coret · 2023-04-05T10:10:54Z

Suggestion for the demonstrator: make the stars a link to an explanation page so the user can read why the dataset got that specific number of stars, and what can be done to acquire more stars. Easiest if to make 5 static pages. Somewhat harder is to make a datasetdescription specific page.

ddeboer · 2023-06-28T08:29:32Z

In our meeting on 28 June 2023, we decided:

to call this rating completeness (volledigheid) instead of quality (kwaliteit)
visualise this as a completion bar instead of stars to nudge publishers to complete their dataset description; @eddeheerna will ask a UX designer to come up with a good solution
show ‘dataset description completeness’ next to ‘dataset quality’ (which we cannot rate as of yet).

ddeboer mentioned this issue Jun 28, 2023

Automate rating during crawling #762

Closed

1 task

coret mentioned this issue Jun 28, 2023

Describe completeness score netwerk-digitaal-erfgoed/registry-demo#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

request for comment: rate dataset descriptions based on presence of properties #683

request for comment: rate dataset descriptions based on presence of properties #683

coret commented Feb 21, 2023

coret commented Apr 5, 2023

ddeboer commented Jun 28, 2023 •

edited

Loading

request for comment: rate dataset descriptions based on presence of properties #683

request for comment: rate dataset descriptions based on presence of properties #683

Comments

coret commented Feb 21, 2023

Rating

Construction

☆☆☆☆☆ rating

☆☆☆☆ rating

☆☆☆ rating

☆☆ rating

☆ rating

TODO

Selection

coret commented Apr 5, 2023

ddeboer commented Jun 28, 2023 • edited Loading

ddeboer commented Jun 28, 2023 •

edited

Loading