From 4fa0c83ef29fb5d439f7684e2468ed2ebdc8aadf Mon Sep 17 00:00:00 2001 From: Franck Michel Date: Thu, 5 Dec 2024 14:11:58 +0100 Subject: [PATCH] Publish last version --- spec/docs/20241205/index.html | 1187 +++++++++++++++++++++++++++++++++ spec/docs/index.html | 268 +++++--- 2 files changed, 1348 insertions(+), 107 deletions(-) create mode 100644 spec/docs/20241205/index.html diff --git a/spec/docs/20241205/index.html b/spec/docs/20241205/index.html new file mode 100644 index 0000000..132a15b --- /dev/null +++ b/spec/docs/20241205/index.html @@ -0,0 +1,1187 @@ + + + + + + +RML-CC: Collections and Containers in RML + + + + + + + + + + + + + + + + + +
+ +

RML-CC: Collections and Containers in RML

+

+ Draft Community Group Report + +

+
+ +
Latest published version:
+ none +
+
Latest editor's draft:
https://w3id.org/rml/cc/spec
+ + + + +
Editors:
+ Christophe Debruyne + + + + (Montefiore Institute, University of Liège) +
+ Franck Michel + + + + (Université Côte d'Azur, CNRS, Inria) +
+ +
Authors:
+ Christophe Debruyne + + + + (Montefiore Institute, University of Liège) +
+ Franck Michel + + + + (Université Côte d'Azur, CNRS, Inria) +
+ +
Website
+ https://github.com/kg-construct/rml-cc/ +
+
+ + +
+
+

Abstract

This document describes the [RML] vocabulary and approach to generating RDF containers and collections [RDF11-Concepts].

+
+ +

Status of This Document

+ This specification was published by the + Knowledge Graph Construction Community Group. It is not a W3C Standard nor is it + on the W3C Standards Track. + + Please note that under the + W3C Community Contributor License Agreement (CLA) + there is a limited opt-out and other conditions apply. + + Learn more about + W3C Community and Business Groups. +

+

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

+ The key words MAY, MUST, and SHOULD in this document + are to be interpreted as described in + BCP 14 + [RFC2119] [RFC8174] + when, and only when, they appear in all capitals, as shown here. +

+ +

2. Overview

This section is non-normative.

+

The RDF Mapping Language (RML) [RML] is a language for expressing mappings between heterogeneous data and RDF. In RML, rules can be expressed to iterate over a data source and refer to specific data within an iteration. Using these iterators and references, RML rules define how to express data in the data source in RDF. RML is based on and extends R2RML [R2RML]. R2RML is defined to express customized mappings only from relational databases to RDF datasets.

+

This document describes RML-CC: +an extension of RML that enables the generation of RDF collections and containers with RML.

+

2.1 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

+

The key words MUST and SHOULD in this document are to be interpreted as +described in BCP 14 [RFC2119] [RFC8174] when, +and only when, they appear in all capitals, as shown here.

+

2.2 Document conventions

We assume readers have basic familiarity with RDF and RML.

+

In this document, examples assume http://example.org/ as the base IRI provided to the RML engine and +the following namespace prefix bindings unless otherwise stated:

+ + + + + + + + + + + + + + + + + + + + + + + +
PrefixNamespace
rml:http://w3id.org/rml/
xsd:http://www.w3.org/2001/XMLSchema#
ex:http://example.org/
:http://example.org/
+

The examples are contained in color-coded boxes. We use the Turtle syntax [Turtle] to write RDF.

+
# This box contains an example input
+ +
# This box contains an example mapping
+ +
# This box contains the example output
+ +

3. Definitions

3.1 Iterations

In the course of this document, term "iteration" is used to refer to the iterations that stem from the logical source when processing the input documents. +An iteration results from the input data (documents, records returned by a query to a database etc.) on which the logical source may apply an optional rml:iterator.

+

In the running example, the data source consists of a single document, the iterator then extracts each of the three sub-documents within the array, thus the logical source yields three iterations.

+

3.2 Multi-valued term map

A multi-valued term map is a term map that, during a single iteration, may yield multiple RDF terms or multiple collections or containers in the case of a gather map.

+

3.3 Named collection or container

A named collection or container is a collection or container whose head node is assigned either an IRI or a blank node identifier.

+

3.4 Well-formed vs. ill-formed collections and containers

There is an important difference between valid RDF and well-formed containers and collections. The following RDF is valid, though the collection is ill-formed since the first cons-pair has two rdf:first and two rdf:rest properties:

+
ex:illformedList 
+  rdf:first 1 ; rdf:rest (2, 3) ;
+  rdf:first 4 ; rdf:rest (5, 6) .
+ +

Similarly, an ill-formed container would have multiple times the same rdf:_n property, e.g.:

+
ex:illformedContainer rdf:_1 1 ; rdf:_2 2 ; rdf:_3 3 ; rdf:_1 4 .
+ +

An RML collection and container validator (RMLCCV) is a system that checks for the well-formedness of collections and containers. The RMLCCV MUST report on any ill-formed collections and containers that are raised in the RDF generation process. An RML processor may include an RMLCCV, but this is not required.

+
+ +

4. Presentation and Example (Informative)

This section gives a brief overview of the RML mapping language. +It also provides simple examples of the generation of RDF collections and containers from JSON documents.

+

Herebelow we present the three main constructs for generating collections and containers. Other predicates, and their use in examples, will be explained further down this document.

+

An rml:GatherMap is a term map that generates a collection (rdf:List) or container (rdf:Bag, rdf:Seq, rdf:Alt). +A gather map has a list of term maps that inform the RML processor which RDF terms have to be generated as members of the list or container. +The rml:gather predicate is used to link an instance of rml:GatherMap with a list of term maps. The generation of a collection or container depends on the rml:gatherAs predicate, which may take any of the following values: rdf:List, rdf:Bag, rdf:Seq, and rdf:Alt.

+

The figure below illustrates the GatherMap and its relationships with other entities of the RML model.

+
+ Graphical overview of RML's vocabulary to generate RDF collections and containers. +
Figure 1 Graphical overview of RML's vocabulary to generate RDF collections and containers.
+
+ + +

4.1 Running example

In this section, the data source consists of a JSON file, data.json, containing the following JSON array:

+
[ 
+  { "id": "a",  "values": [ "1" , "2" , "3" ] },
+  { "id": "b",  "values": [ "4" , "5" , "6" ] },
+  { "id": "c",  "values": [ "7" , "8" , "9" ] } 
+]
+ +

The associated RML mapping starts as follows:

+
@prefix rml: <http://w3id.org/rml/>.
+@prefix ex:  <http://example.com/ns/>.
+@base        <http://example.com/ns/>.
+
+<#TM> a rml:TriplesMap;
+  rml:logicalSource [
+    rml:referenceFormulation rml:JSONPath;
+    rml:source [ 
+        a rml:RelativePathSource;
+        rml:root rml:MappingDirectory;
+        rml:path "data.json"
+    ] ;
+    rml:iterator "$.*" ;
+  ];
+
+  rml:subjectMap [
+    rml:template "{id}" ;
+  ] ;
+.
+ +

Note that the rml:iterator in the logical source will yield three iterations, each one providing one of the three sub-documents of the JSON array:

+
{ "id": "a",  "values": [ "1" , "2" , "3" ] }
+{ "id": "b",  "values": [ "4" , "5" , "6" ] }
+{ "id": "c",  "values": [ "7" , "8" , "9" ] }
+ + +

4.2 A simple example

Given the JSON document and the RML mapping completed with the following predicate object map:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

In this example, each iteration yields a new (unique) blank node that is the head of the collection being produced. +The following output will be produced:

+
:a ex:with ("1" "2" "3") .
+:b ex:with ("4" "5" "6") .
+:c ex:with ("7" "8" "9") .
+ + +

4.3 Collections and containers identified with an IRI or blank node ID

+

In the previous example, the gather map does not contain any rml:template, rml:constant or rml:reference property. +By contrast, the example below identifies the collection with a rml:template property. The IRI generated by the template will be assigned to the head node of the collection. We refer to this as a named collection.

+

The following mapping:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:template "list{id}" ;
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

will yield the following output:

+
:a ex:with :lista . :lista rdf:first "1" ; rdf:rest ("2" "3") .
+:b ex:with :listb . :listb rdf:first "4" ; rdf:rest ("5" "6") .
+:c ex:with :listc . :listc rdf:first "7" ; rdf:rest ("8" "9") .
+ +

This is similar to the previous example, yet in this case the head node of each produced collection is assigned an IRI :lista, :listb and :listc.

+
+ +

5. Vocabulary definition

This section introduces the classes, properties, and constants of the RML Containers and Collections specification.

+

5.1 Classes

+

5.1.1 rml:GatherMap

Gather maps are term maps that use rml:gather and rml:gatherAs to generate collections and containers from a list of term maps.

+
    +
  • A rml:GatherMap MUST have exactly one rml:gather property.
  • +
  • A rml:GatherMap MUST have exactly one rml:gatherAs property.
  • +
  • A rml:GatherMap MAY have zero or exactly one rml:strategy property.
  • +
+

5.1.2 rml:Strategy

A strategy is a plan or set of actions designed to achieve a specific goal or outcome. Instances of rml:Strategy represent ways to perform an action such as combining two collections and containers. See constants for examples.

+

5.2 Properties

+

5.2.1 rml:gather

+

The rml:gather informs the RML processor where the terms of a collection or container come from. This property relates a gather map with a non-empty list of term maps. +That list of term maps may contain other gather maps thus generating nested containers and/or collections.

+
    +
  • The domain of rml:gather is rml:GatherMap.
  • +
  • The range of rml:gather is a non-empty list (rdf:List) of rml:TermMap instances. In particular, this list may include instances of rml:GatherMap thus allowing for nested gather maps.
  • +
+

5.2.2 rml:strategy

+

Declaring an rml:strategy in a gather map informs the processor about how to create collections and containers when faced with multi-valued term maps. +This specification defines rml:append and rml:cartesianProduct as instances of rml:Strategy.

+

In the rml:append strategy, the sets of RDF terms generated by each term map of the gather map are simply appended to the collection (respectively container) being constructed. Thus, only one collection (respectively container) is generated.

+

Conversely, in the rml:cartesianProduct strategy, the gather map generates collections (respectively containers) each containing one RDF term generated by each term map of the gather map. In other words, it carries out a cartesian product between the terms generated by each term map, thus constructing as many collections (respectively containers) as the product of the number of RDF terms from each term map.

+

A gather map does not need to specify a strategy, the default strategy is rml:append.

+

5.2.3 rml:gatherAs

+

The property rml:gatherAs relates a gather map with the desired result type: a type of container or collections.

+
    +
  • The domain of rml:gatherAs is rml:GatherMap.
  • +
  • The range of rml:gatherAs is one of the following: rdf:Seq, rdf:Bag, rdf:Alt, rdf:List.
  • +
+

5.2.4 rml:allowEmptyListAndContainer

+

This predicate is to be used alongside rml:gather and rml:gatherAs. It specifies the behavior of a gather map in case the rml:gather does not yield any element.

+

The range of rml:allowEmptyListAndContainer is xsd:boolean. +When true, the gather map will generate rdf:nil for an RDF collection, or a resource with no members for an RDF container. +When false, the gather map will not generate a collection or container.

+
    +
  • The domain of rml:allowEmptyListAndContainer is rml:GatherMap.
  • +
  • The range of rml:allowEmptyListAndContainer is xsd:boolean.
  • +
+

Property rml:allowEmptyListAndContainer is optional, it takes the value false by default.

+

5.3 Constants

5.3.1 rml:append

+

rml:append is an instance of class rml:Strategy. +Used as the object of property rml:strategy, it informs the processor that the sets of RDF terms generated by each term map of the gather map are to be appended within the collection or container. The order is that in which the term maps are declared in the gather map. Example:

+

For the input document:

+
{ 
+  "a": [ "1" , "2" , "3" ],
+  "b": [ "4" , "5" ] 
+}
+ +

The following term map:

+
rml:objectMap [
+    rml:gather ( [ rml:reference "a.*" ] [ rml:reference "b.*" ]) ;
+    rml:gatherAs rdf:List ;
+    rml:strategy rml:append;   # this is the default strategy
+] ;
+ +

would generate a list by appending the terms produced by the two term maps in the gather map:

+
("1" "2" "3" "4" "5" )
+ + +

5.3.2 rml:cartesianProduct

+

rml:cartesianProduct is an instance of class rml:Strategy. +Used as the object of property rml:strategy, it informs the processor that the RDF terms generated by each term map of the gather map are to be grouped (in the constructed collection or container) by doing a cartesian product of these terms. +Therefore, this constructs as many collections or containers as the product of the number of terms from each term map. Example:

+

For the input document:

+
{ 
+  "a": [ "1" , "2" , "3" ],
+  "b": [ "4" , "5" ] 
+}
+ +

The following term map:

+
rml:objectMap [
+    rml:gather ( [ rml:reference "a.*" ] [ rml:reference "b.*" ]) ;
+    rml:gatherAs rdf:List ;
+    rml:strategy rml:cartesianProduct;
+] ;
+ +

would generate 3*2 = 6 lists by grouping the terms produced by the two term maps in the gather map:

+
("1" "4") ("1" "5") 
+("2" "4") ("2" "5")
+("3" "4") ("3" "5")
+ +

6. Considerations

6.1 Using a rml:GatherMap in various types of term map

+

Although most examples demonstrate the use of a gather map in the context of an object map, a gather map is a regular term map. +As such, it can be used in other types of term maps such as a subject or predicate map.

+

Term maps generate RDF terms (IRI, blank node, literal) to be used as the terms of RDF triples. +If such a term map generates a collection or container by means of a gather map, the term retained to form an RDF triple is the head node of the collection or container. +In the case of an RDF list, this is the node that is the subject of the first rdf:first predicate.

+

The examples section demonstrates how a gather map can be used within a subject map.

+

6.2 Named collection or container: assigning an IRI or blank node identifier to a collection and container

+

If a gather map does not contain any rml:template, rml:constant or rml:reference property, then the head node of each generated collection or container is a new blank node.

+

Conversely, if a gather map contains either a rml:template, rml:constant or rml:reference property, then the gather map yields named collections or containers whose head node is identified as instructed by the rml:template, rml:constant or rml:reference property.

+

The following mapping:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:template "seq{id}" ;
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:Seq ;
+  ] ;
+] ;
+ +

will yield the following output:

+
:a ex:with :seqa . :seqa rdf:_1 "1" ; rdf:_2 "2" , rdf:_3 "3" .
+:b ex:with :seqb . :seqb rdf:_1 "4" ; rdf:_2 "5" , rdf:_3 "6" .
+:c ex:with :seqc . :seqc rdf:_1 "7" ; rdf:_2 "8" , rdf:_3 "9"  .
+ + +

6.3 Generating well-formed named collections or containers

+

When generating a named collection or container, it may happen that the same IRI or blank node identifier be generated several times, either across multiple iterations or because the gather map is multi-valued as exemplified with the rml:cartesianProduct strategy.

+

In this situation, to avoid generating ill-formed collections or containers, the processor MUST concatenate (i.e. append) the new collection or container to the previous one. +In other words, when a gather map creates a named collection or container, the processor must first check whether a named collection or container with the same head node IRI or blank node identifier already exists, and if so, it must append the terms to the existing one.

+

Below we exemplify two such situations.

+

6.3.1 Named collections or containers generated across multiple iterations

Here we reuse the running example yet with a slight variation: there are two JSON objects with the value "a" for "id".

+
[ 
+  { "id": "a",  "values": [ "1" , "2" , "3" ] },
+  { "id": "b",  "values": [ "4" , "5" , "6" ] },
+  { "id": "a",  "values": [ "7" , "8" , "9" ] } 
+]
+ +

Let's consider the following mapping:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

The gather map has no rml:template, rml:constant nor rml:reference property. The expected output consists of three lists, two related to :a and one to :b:

+
:a ex:with ("1" "2" "3"), ("7" "8" "9")  .
+:b ex:with ("4" "5" "6") .
+ +

Now, when an rml:template, rml:constant or rml:reference is provided, +the two collections related to id "a" cannot be generated separately since they would share the same head node IRI or bank node identifier, thus generating an ill-formed collection. Therefore, the processor must concatenate the two collections related to id "a". +With the following predicate mapping:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:template "list{id}" ;
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

The processor must generate the following output:

+
:a ex:with :lista . :lista rdf:first "1" ; rdf:rest ("2" "3" "7" "8" "9") .
+:b ex:with :listb . :listb rdf:first "4" ; rdf:rest ("5" "6") .
+ +

It is assumed that a processor will concatenate the collections or containers while respecting the order of the iterations as provided by the logical source.

+

6.3.2 Named collections or containers generated by a multi-valued gather map

Let's consider the following input document:

+
{ 
+  "id": "myid",
+  "a": [ "1" , "2" , "3" ],
+  "b": [ "4" , "5" ] 
+}
+ +

and the following mapping:

+
rml:subjectMap [ rml:template "{id}" ] ;
+
+rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+    rml:gather ( [ rml:reference "a.*" ] [ rml:reference "b.*" ]) ;
+    rml:gatherAs rdf:List ;
+    rml:strategy rml:cartesianProduct ;
+  ] ;
+] ;
+ +

The gather map has no rml:template, rml:constant nor rml:reference property. +As already illustrated, the rml:cartesianProduct strategy will generate multiple collections, yielding the output:

+
[] ex:with ("1" "4"), ("1" "5"), ("2" "4"), ("2" "5"), ("3" "4"), ("3" "5") .
+ + +

Now, when an rml:template, rml:constant or rml:reference is provided, to avoid generating ill-formed lists that would share the same head node IRI, the processor must concatenate the lists.

+

If we add an rml:template in the object map:

+
rml:objectMap [
+  rml:template "list{id}" ;
+  rml:gather ( [ rml:reference "a.*" ] [ rml:reference "b.*" ]) ;
+  rml:gatherAs rdf:List ;
+  rml:strategy rml:cartesianProduct ;
+] ;
+ +

The processor must now generate the following output:

+
:myid ex:with ("1" "4" "1" "5" "2" "4" "2" "5" "3" "4" "3" "5" ).
+ + + +

6.3.3 Named collections or containers generated across multiple iterations and with a multi-valued term map

+

An even more tricky situation combines the two previous sections, involving at the same time multiple iterations and multi-valued gather maps.

+

Let's consider the following document and mapping:

+
[ 
+  { "id": "a",  "values1": [ "1" ],       "values2": [ "a" , "b" ] },
+  { "id": "b",  "values1": [ "3" , "4" ], "values2": [ "c" , "d" ] },
+  { "id": "a",  "values1": [ "5" , "6" ], "values2": [ "e" ] } 
+]
+ +
rml:logicalSource [
+  ...
+  rml:iterator "$.*" ;
+];
+
+rml:subjectMap [ rml:template "{id}" ] ;
+
+rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:template "list{id}" ;
+      rml:gather ( [ rml:reference "values1.*" ; ] [ rml:reference "values2.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

For each document, the values of values1 and values2 are appended in the same list, as per the default rml:append strategy. +Furthermore, the lists generated for id "a" must be concatenated since they share the same head node IRI, as explained in the multiple iterations case. +The expected output is:

+
:a ex:with :lista .
+:lista rdf:first "1" ; rdf:rest ("a" "b" "5" "6" "e") .
+:b ex:with :listb .
+:listb rdf:first "3" ; rdf:rest ("4" "c" "d") .
+ +

Now let's change the default strategy to rml:cartesianProduct:

+
rml:objectMap [
+    rml:template "list{id}" ;
+    rml:gather ( [ rml:reference "values1.*" ; ] [ rml:reference "values2.*" ; ] ) ;
+    rml:gatherAs rdf:List ;
+    rml:strategy rml:cartesianProduct ;
+] ;
+ +

Each iteration will now yield multiple lists by combining the values of values1 and values2.

+

For the document with id "b", there are ("3" "c") ("3" "d") ("4" "c") ("4" "d"). +But since the template generates the same IRI for all of them, they must be concatenated into a single list: ("3" "c" "3" "d" "4" "c" "4" "d"), as explained in the multi-valued gather map case.

+

Similarly, for the documents with id "a", the result is: ("1" "a" "1" "b") and ("5" "e" "6" "e"). +But again, these lists must be concatenated since they share the same head node IRI, as explained in the multiple iterations case.

+

Therefore, the processor must now generate the following output:

+
:a ex:with :lista .
+:lista rdf:first "1" ; rdf:rest ("a" "1" "b" "5" "e" "6" "e") .
+:b ex:with :listb .
+:listb rdf:first "3" ; rdf:rest ("c" "3" "d" "4" "c" "4" "d") .
+ +

7. Examples (Informative)

In this section, we present additional examples and describe the expected output.

+

7.1 Dealing with empty collections and containers

+

By default, rml:allowEmptyListAndContainer is false. +Thus, processing the following JSON document with the predicate object map provided in the running example would not yield any result for the document with "id": "d".

+
[ 
+  { "id": "a",  "values": [ "1" , "2" , "3" ] },
+  { "id": "b",  "values": [ "4" , "5" , "6" ] },
+  { "id": "c",  "values": [ "7" , "8" , "9" ] },
+  { "id": "d",  "values": [] } 
+]
+ +

However, when we override the value for this property and set it to true:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:allowEmptyListAndContainer true ;
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

the predicate object map will generate:

+
:a ex:with ("1" "2" "3") .
+:b ex:with ("4" "5" "6") .
+:c ex:with ("7" "8" "9") .
+:d ex:with () .
+ +

There is one special case when dealing with empty collections. Since rdf:nil is reserved for the empty list, an RML processor MUST replace each IRI or blank node that is an empty list with rdf:nil. +In other words, when the following predicate object map is used:

+
rml:predicateObjectMap [
+  rml:predicate ex:with ;
+  rml:objectMap [
+      rml:template "list{id}" ;
+      rml:allowEmptyListAndContainer true ;
+      rml:gather ( [ rml:reference "values.*" ; ] ) ;
+      rml:gatherAs rdf:List ;
+  ] ;
+] ;
+ +

then the document with "id": "d" entails an empty list, that is a list whose head node is rdf:nil and therefore has no IRI. +We expect the following output where

+
:a ex:with :lista .
+:lista rdf:first "1" ; rdf:rest ("2" "3") .
+:b ex:with :listb .
+:listb rdf:first "4" ; rdf:rest ("5" "6") .
+:c ex:with :listc .
+:listc rdf:first "7" ; rdf:rest ("8" "9") .
+:d ex:with () .
+ + +

7.2 Relational data example

In this section, we use the following relational database and document for our example.

+

Table BOOK:

+ + + + + + + + + + + + + + + +
IDTITLE
1Frankenstein
2The Long Earth
+

Table AUTHOR:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IDTITLEFNAMELNAMEBOOKID
1MaryShelley1
2SirTerryPratchett2
3StephenBaxter2
4
+

The following mapping will relate instances of authors to names. The names of authors are, for the sake of the example, represented as bags containing a title, first name, and lastname.

+
<#AuthorTM>
+    rml:logicalTable [ rml:tableName "AUTHOR" ; ] ;
+    rml:subjectMap [ rml:template "/person{ID}" ; ] ;
+    rml:predicateObjectMap [
+        rml:predicate ex:name ;
+        rml:objectMap [
+            rml:reference "ID" ; rml:termType rml:BlankNode ;
+            rml:gather ( 
+                [ rml:reference "TITLE" ]  [ rml:reference "FNAME" ]  [ rml:reference "LNAME" ] 
+            ) ;
+            rml:gatherAs rdf:Bag ;
+        ] ;
+    ] ;
+.
+ +

In this example we generate, for each row in table AUTHOR, an blank node of type rdf:Bag. Each such bag "gathers" values from different term maps. The execution of this mapping will produce the following result:

+
:person1 ex:name [ a rdf:Bag; rdf:_1 "Mary"; rdf:_2 "Shelley" ] . 
+:person2 ex:name [ a rdf:Bag; rdf:_1 "Sir"; rdf:_2 "Terry"; rdf:_3 "Pratchett" ] . 
+:person3 ex:name [ a rdf:Bag; rdf:_1 "Stephen"; rdf:_2 "Baxter" ] .
+ +

While not shown in this example, different term maps allow to collect terms of different types: resources, literals, typed or language-tagged literals, etc. The fourth record in the table did not generate a bag, since each term map in the gather map did not yield a value. +By default, empty lists and containers are withheld. One does have the possibility to keep those with rml:allowEmptyListAndContainer`.

+

7.3 Using referencing object map

+

Continuing with the relational data example, here we relate books to authors with a rml:parentTriplesMap. The authors of a book are represented as a list.

+
<#BookTM>
+    rml:logicalTable [ rml:tableName "BOOK" ; ] ;
+    rml:subjectMap [ rml:template "/book{ID}" ; ] ;
+    rml:predicateObjectMap [
+        rml:predicate ex:writtenBy ;
+        rml:objectMap [
+            rml:reference "ID" ; rml:termType rml:BlankNode ;
+            rml:gather ( 
+                [ 
+                    rml:parentTriplesMap <#AuthorTM>;
+                    rml:joinCondition [ rml:child "ID" ; rml:parent "BOOKID" ; ] ;
+                ] 
+            ) ;
+            rml:gatherAs rdf:List;
+        ] ;
+    ] ;
+.
+ +

Intuitively, we will join each record (or iteration) with data from the parent triples map. The join may yield one or more results, which are then gathered into a list. The execution of this mapping will produce the following RDF:

+
:book1 ex:writtenby ( :person1 ) . 
+:book2 ex:writtenby ( :person2 :person3 ) .
+ +

In RML, it is assumed that each term map is multi-valued. That this, each term map may return one or more values. The default behavior is to append the values in the order of the term maps appearing in the gather map.

+

7.4 Using a gather map in a subject map

Here we exemplify the use of a term map in a subject map. Continuing with the JSON file from the running example, the following mapping generates an RDF sequence whose head node is used to state provenance information on that sequence:

+
<#TM> a rml:TriplesMap;
+  rml:logicalSource [
+    rml:source [ 
+        a rml:RelativePathSource;
+        rml:root rml:MappingDirectory;
+        rml:path "data.json"
+    ] ;
+    rml:iterator "$.*" ;
+  ];
+
+  rml:subjectMap [
+    rml:template "seq{id}" ;
+    rml:gather ( [ rml:reference "values.*" ; ] ) ;
+    rml:gatherAs rdf:Seq ;  
+  ] ;
+  
+  rml:predicateObjectMap [
+    rml:predicate prov:wasDerivedFrom ;
+    rml:object <data.json> ;
+  ] .
+ +

The expected result is:

+
:seqa rdf:_1 "1" ; rdf:_2 "2" ; rdf:_3 "3" .
+:seqa prov:wasDerivedFrom <data.json> .
+
+:seqb rdf:_1 "4" ; rdf:_2 "5" ; rdf:_3 "6" .
+:seqb prov:wasDerivedFrom <data.json> .
+
+:seqc rdf:_1 "7" ; rdf:_2 "8" ; rdf:_3 "9" .
+:seqc prov:wasDerivedFrom <data.json> .
+ + + +

A. References

A.1 Normative references

+ +
[R2RML]
+ R2RML: RDB to RDF Mapping Language. W3C. 27 September 2012. W3C Recommendation. URL: https://www.w3.org/TR/r2rml/ +
[RFC2119]
+ Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119 +
[RFC8174]
+ Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174 +
[RML]
+ RDF Mapping Language. https://rml.io. 06 October 2020. Unofficial draft. URL: https://rml.io/specs/rml/ +
[Turtle]
+ RDF 1.1 Turtle. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/ +
+

A.2 Informative references

+ +
[RDF11-Concepts]
+ RDF 1.1 Concepts and Abstract Syntax. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf11-concepts/ +
+
\ No newline at end of file diff --git a/spec/docs/index.html b/spec/docs/index.html index 212eaa7..666d2df 100644 --- a/spec/docs/index.html +++ b/spec/docs/index.html @@ -1,16 +1,16 @@ - + - + -
+

RML-CC: Collections and Containers in RML

Draft Community Group Report - +

@@ -273,22 +277,66 @@

RML-CC: Collections and Containers in RML

Editors:
- Christophe Debruyne (Montefiore Institute, University of Liège) + Christophe Debruyne + + + + (Montefiore Institute, University of Liège)
- Franck Michel (Université Côte d'Azur, CNRS, Inria) + Franck Michel + + + + (Université Côte d'Azur, CNRS, Inria)
- +
Authors:
+ Christophe Debruyne + + + + (Montefiore Institute, University of Liège) +
+ Franck Michel + + + + (Université Côte d'Azur, CNRS, Inria) +
Website
- https://w3id.org/rml/cc/spec + https://github.com/kg-construct/rml-cc/

RML-CC: Collections and Containers in RML

Learn more about W3C Community and Business Groups. -

+

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, and SHOULD in this document are to be interpreted as described in @@ -332,7 +380,7 @@

RML-CC: Collections and Containers in RML

described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2.2 Document conventions

We assume readers have basic familiarity with RDF and RML.

-

In this document, examples assume +

In this document, examples assume http://example.org/ as the base IRI provided to the RML engine and the following namespace prefix bindings unless otherwise stated:

@@ -403,14 +451,17 @@

RML-CC: Collections and Containers in RML

The associated RML mapping starts as follows:

@prefix rml: <http://w3id.org/rml/>.
-@prefix ql:  <http://semweb.mmlab.be/ns/ql#>.
-@prefix ex:  <http://example.com/ns>.
-@base        <http://example.com/ns>.
+@prefix ex:  <http://example.com/ns/>.
+@base        <http://example.com/ns/>.
 
 <#TM> a rml:TriplesMap;
   rml:logicalSource [
-    rml:source "data.json" ;
-    rml:referenceFormulation ql:JSONPath ;
+    rml:referenceFormulation rml:JSONPath;
+    rml:source [ 
+        a rml:RelativePathSource;
+        rml:root rml:MappingDirectory;
+        rml:path "data.json"
+    ] ;
     rml:iterator "$.*" ;
   ];
 
@@ -448,18 +499,18 @@ 

RML-CC: Collections and Containers in RML

rml:predicateObjectMap [
   rml:predicate ex:with ;
   rml:objectMap [
-      rml:template "list/{id}" ;
+      rml:template "list{id}" ;
       rml:gather ( [ rml:reference "values.*" ; ] ) ;
       rml:gatherAs rdf:List ;
   ] ;
 ] ;

will yield the following output:

-
:a ex:with :list/a . :list/a rdf:first "1" ; rdf:rest ("2" "3") .
-:b ex:with :list/b . :list/b rdf:first "4" ; rdf:rest ("5" "6") .
-:c ex:with :list/c . :list/c rdf:first "7" ; rdf:rest ("8" "9") .
+
:a ex:with :lista . :lista rdf:first "1" ; rdf:rest ("2" "3") .
+:b ex:with :listb . :listb rdf:first "4" ; rdf:rest ("5" "6") .
+:c ex:with :listc . :listc rdf:first "7" ; rdf:rest ("8" "9") .
-

This is similar to the previous example, yet in this case the head node of each produced collection is assigned an IRI :list/a, :list/b and :list/c.

+

This is similar to the previous example, yet in this case the head node of each produced collection is assigned an IRI :lista, :listb and :listc.

5. Vocabulary definition

This section introduces the classes, properties, and constants of the RML Containers and Collections specification.

@@ -470,6 +521,7 @@

RML-CC: Collections and Containers in RML

  • A rml:GatherMap MUST have exactly one rml:gatherAs property.
  • A rml:GatherMap MAY have zero or exactly one rml:strategy property.
  • +

    5.1.2 rml:Strategy

    A strategy is a plan or set of actions designed to achieve a specific goal or outcome. Instances of rml:Strategy represent ways to perform an action such as combining two collections and containers. See constants for examples.

    5.2 Properties

    5.2.1 rml:gather

    The rml:gather informs the RML processor where the terms of a collection or container come from. This property relates a gather map with a non-empty list of term maps. @@ -478,7 +530,7 @@

    RML-CC: Collections and Containers in RML

  • The domain of rml:gather is rml:GatherMap.
  • The range of rml:gather is a non-empty list (rdf:List) of rml:TermMap instances. In particular, this list may include instances of rml:GatherMap thus allowing for nested gather maps.
  • -

    5.2.2 rml:strategy

    +

    5.2.2 rml:strategy

    Declaring an rml:strategy in a gather map informs the processor about how to create collections and containers when faced with multi-valued term maps. This specification defines rml:append and rml:cartesianProduct as instances of rml:Strategy.

    In the rml:append strategy, the sets of RDF terms generated by each term map of the gather map are simply appended to the collection (respectively container) being constructed. Thus, only one collection (respectively container) is generated.

    @@ -500,8 +552,7 @@

    RML-CC: Collections and Containers in RML

  • The range of rml:allowEmptyListAndContainer is xsd:boolean.
  • Property rml:allowEmptyListAndContainer is optional, it takes the value false by default.

    -

    5.3 Constants

    -

    5.3.1 rml:append

    +

    5.3 Constants

    5.3.1 rml:append

    rml:append is an instance of class rml:Strategy. Used as the object of property rml:strategy, it informs the processor that the sets of RDF terms generated by each term map of the gather map are to be appended within the collection or container. The order is that in which the term maps are declared in the gather map. Example:

    For the input document:

    @@ -557,16 +608,16 @@

    RML-CC: Collections and Containers in RML

    rml:predicateObjectMap [
       rml:predicate ex:with ;
       rml:objectMap [
    -      rml:template "seq/{id}" ;
    +      rml:template "seq{id}" ;
           rml:gather ( [ rml:reference "values.*" ; ] ) ;
           rml:gatherAs rdf:Seq ;
       ] ;
     ] ;

    will yield the following output:

    -
    :a ex:with :seq/a . :seq/a rdf:_1 "1" ; rdf:_2 "2" , rdf:_3 "3" .
    -:b ex:with :seq/b . :seq/b rdf:_1 "4" ; rdf:_2 "5" , rdf:_3 "6" .
    -:c ex:with :seq/c . :seq/c rdf:_1 "7" ; rdf:_2 "8" , rdf:_3 "9"  .
    +
    :a ex:with :seqa . :seqa rdf:_1 "1" ; rdf:_2 "2" , rdf:_3 "3" .
    +:b ex:with :seqb . :seqb rdf:_1 "4" ; rdf:_2 "5" , rdf:_3 "6" .
    +:c ex:with :seqc . :seqc rdf:_1 "7" ; rdf:_2 "8" , rdf:_3 "9"  .

    6.3 Generating well-formed named collections or containers

    @@ -600,15 +651,15 @@

    RML-CC: Collections and Containers in RML

    rml:predicateObjectMap [
       rml:predicate ex:with ;
       rml:objectMap [
    -      rml:template "list/{id}" ;
    +      rml:template "list{id}" ;
           rml:gather ( [ rml:reference "values.*" ; ] ) ;
           rml:gatherAs rdf:List ;
       ] ;
     ] ;

    The processor must generate the following output:

    -
    :a ex:with :list/a . :list/a rdf:first "1" ; rdf:rest ("2" "3" "7" "8" "9") .
    -:b ex:with :list/b . :list/b rdf:first "4" ; rdf:rest ("5" "6") .
    +
    :a ex:with :lista . :lista rdf:first "1" ; rdf:rest ("2" "3" "7" "8" "9") .
    +:b ex:with :listb . :listb rdf:first "4" ; rdf:rest ("5" "6") .

    It is assumed that a processor will concatenate the collections or containers while respecting the order of the iterations as provided by the logical source.

    6.3.2 Named collections or containers generated by a multi-valued gather map

    Let's consider the following input document:

    @@ -632,20 +683,20 @@

    RML-CC: Collections and Containers in RML

    The gather map has no rml:template, rml:constant nor rml:reference property. As already illustrated, the rml:cartesianProduct strategy will generate multiple collections, yielding the output:

    -
    :a ex:with ("1" "4"), ("1" "5"), ("2" "4"), ("2" "5"), ("3" "4"), ("3" "5") .
    +
    [] ex:with ("1" "4"), ("1" "5"), ("2" "4"), ("2" "5"), ("3" "4"), ("3" "5") .

    Now, when an rml:template, rml:constant or rml:reference is provided, to avoid generating ill-formed lists that would share the same head node IRI, the processor must concatenate the lists.

    If we add an rml:template in the object map:

    rml:objectMap [
    -  rml:template "list/{id}" ;
    +  rml:template "list{id}" ;
       rml:gather ( [ rml:reference "a.*" ] [ rml:reference "b.*" ]) ;
       rml:gatherAs rdf:List ;
       rml:strategy rml:cartesianProduct ;
     ] ;

    The processor must now generate the following output:

    -
    :a ex:with ("1" "4" "1" "5" "2" "4" "2" "5" "3" "4" "3" "5" ).
    +
    :myid ex:with ("1" "4" "1" "5" "2" "4" "2" "5" "3" "4" "3" "5" ).
    @@ -668,7 +719,7 @@

    RML-CC: Collections and Containers in RML

    rml:predicateObjectMap [ rml:predicate ex:with ; rml:objectMap [ - rml:template "list/{id}" ; + rml:template "list{id}" ; rml:gather ( [ rml:reference "values1.*" ; ] [ rml:reference "values2.*" ; ] ) ; rml:gatherAs rdf:List ; ] ; @@ -677,14 +728,14 @@

    RML-CC: Collections and Containers in RML

    For each document, the values of values1 and values2 are appended in the same list, as per the default rml:append strategy. Furthermore, the lists generated for id "a" must be concatenated since they share the same head node IRI, as explained in the multiple iterations case. The expected output is:

    -
    :a ex:with :list/a .
    -:list/a rdf:first "1" ; rdf:rest ("a" "b" "5" "6" "e") .
    -:b ex:with :list/b .
    -:list/b rdf:first "3" ; rdf:rest ("4" "c" "d") .
    +
    :a ex:with :lista .
    +:lista rdf:first "1" ; rdf:rest ("a" "b" "5" "6" "e") .
    +:b ex:with :listb .
    +:listb rdf:first "3" ; rdf:rest ("4" "c" "d") .

    Now let's change the default strategy to rml:cartesianProduct:

    rml:objectMap [
    -    rml:template "list/{id}" ;
    +    rml:template "list{id}" ;
         rml:gather ( [ rml:reference "values1.*" ; ] [ rml:reference "values2.*" ; ] ) ;
         rml:gatherAs rdf:List ;
         rml:strategy rml:cartesianProduct ;
    @@ -696,10 +747,10 @@ 

    RML-CC: Collections and Containers in RML

    Similarly, for the documents with id "a", the result is: ("1" "a" "1" "b") and ("5" "e" "6" "e"). But again, these lists must be concatenated since they share the same head node IRI, as explained in the multiple iterations case.

    Therefore, the processor must now generate the following output:

    -
    :a ex:with :list/a .
    -:list/a rdf:first "1" ; rdf:rest ("a" "1" "b" "5" "e" "6" "e") .
    -:b ex:with :list/b .
    -:list/b rdf:first "3" ; rdf:rest ("c" "3" "d" "4" "c" "4" "d") .
    +
    :a ex:with :lista .
    +:lista rdf:first "1" ; rdf:rest ("a" "1" "b" "5" "e" "6" "e") .
    +:b ex:with :listb .
    +:listb rdf:first "3" ; rdf:rest ("c" "3" "d" "4" "c" "4" "d") .

    7. Examples (Informative)

    In this section, we present additional examples and describe the expected output.

    7.1 Dealing with empty collections and containers

    @@ -733,7 +784,7 @@

    RML-CC: Collections and Containers in RML

    rml:predicateObjectMap [
       rml:predicate ex:with ;
       rml:objectMap [
    -      rml:template "list/{id}" ;
    +      rml:template "list{id}" ;
           rml:allowEmptyListAndContainer true ;
           rml:gather ( [ rml:reference "values.*" ; ] ) ;
           rml:gatherAs rdf:List ;
    @@ -742,12 +793,12 @@ 

    RML-CC: Collections and Containers in RML

    then the document with "id": "d" entails an empty list, that is a list whose head node is rdf:nil and therefore has no IRI. We expect the following output where

    -
    :a ex:with :list/a .
    -:list/a rdf:first "1" ; rdf:rest ("2" "3") .
    -:b ex:with :list/b .
    -:list/b rdf:first "4" ; rdf:rest ("5" "6") .
    -:c ex:with :list/c .
    -:list/c rdf:first "7" ; rdf:rest ("8" "9") .
    +
    :a ex:with :lista .
    +:lista rdf:first "1" ; rdf:rest ("2" "3") .
    +:b ex:with :listb .
    +:listb rdf:first "4" ; rdf:rest ("5" "6") .
    +:c ex:with :listc .
    +:listc rdf:first "7" ; rdf:rest ("8" "9") .
     :d ex:with () .
    @@ -812,7 +863,7 @@

    RML-CC: Collections and Containers in RML

    The following mapping will relate instances of authors to names. The names of authors are, for the sake of the example, represented as bags containing a title, first name, and lastname.

    <#AuthorTM>
         rml:logicalTable [ rml:tableName "AUTHOR" ; ] ;
    -    rml:subjectMap [ rml:template "/person/{ID}" ; ] ;
    +    rml:subjectMap [ rml:template "/person{ID}" ; ] ;
         rml:predicateObjectMap [
             rml:predicate ex:name ;
             rml:objectMap [
    @@ -826,9 +877,9 @@ 

    RML-CC: Collections and Containers in RML

    .

    In this example we generate, for each row in table AUTHOR, an blank node of type rdf:Bag. Each such bag "gathers" values from different term maps. The execution of this mapping will produce the following result:

    -
    :person/1 ex:name [ a rdf:Bag; rdf:_1 "Mary"; rdf:_2 "Shelley" ] . 
    -:person/2 ex:name [ a rdf:Bag; rdf:_1 "Sir"; rdf:_2 "Terry"; rdf:_3 "Pratchett" ] . 
    -:person/3 ex:name [ a rdf:Bag; rdf:_1 "Stephen"; rdf:_2 "Baxter" ] .
    +
    :person1 ex:name [ a rdf:Bag; rdf:_1 "Mary"; rdf:_2 "Shelley" ] . 
    +:person2 ex:name [ a rdf:Bag; rdf:_1 "Sir"; rdf:_2 "Terry"; rdf:_3 "Pratchett" ] . 
    +:person3 ex:name [ a rdf:Bag; rdf:_1 "Stephen"; rdf:_2 "Baxter" ] .

    While not shown in this example, different term maps allow to collect terms of different types: resources, literals, typed or language-tagged literals, etc. The fourth record in the table did not generate a bag, since each term map in the gather map did not yield a value. By default, empty lists and containers are withheld. One does have the possibility to keep those with rml:allowEmptyListAndContainer`.

    @@ -836,7 +887,7 @@

    RML-CC: Collections and Containers in RML

    Continuing with the relational data example, here we relate books to authors with a rml:parentTriplesMap. The authors of a book are represented as a list.

    <#BookTM>
         rml:logicalTable [ rml:tableName "BOOK" ; ] ;
    -    rml:subjectMap [ rml:template "/book/{ID}" ; ] ;
    +    rml:subjectMap [ rml:template "/book{ID}" ; ] ;
         rml:predicateObjectMap [
             rml:predicate ex:writtenBy ;
             rml:objectMap [
    @@ -853,20 +904,23 @@ 

    RML-CC: Collections and Containers in RML

    .

    Intuitively, we will join each record (or iteration) with data from the parent triples map. The join may yield one or more results, which are then gathered into a list. The execution of this mapping will produce the following RDF:

    -
    :book/1 ex:writtenby ( :person/1 ) . 
    -:book/2 ex:writtenby ( :person/2 :person/3 ) .
    +
    :book1 ex:writtenby ( :person1 ) . 
    +:book2 ex:writtenby ( :person2 :person3 ) .

    In RML, it is assumed that each term map is multi-valued. That this, each term map may return one or more values. The default behavior is to append the values in the order of the term maps appearing in the gather map.

    7.4 Using a gather map in a subject map

    Here we exemplify the use of a term map in a subject map. Continuing with the JSON file from the running example, the following mapping generates an RDF sequence whose head node is used to state provenance information on that sequence:

    <#TM> a rml:TriplesMap;
       rml:logicalSource [
    -    rml:source "data.json" ;
    -    rml:referenceFormulation ql:JSONPath ;
    +    rml:source [ 
    +        a rml:RelativePathSource;
    +        rml:root rml:MappingDirectory;
    +        rml:path "data.json"
    +    ] ;
         rml:iterator "$.*" ;
       ];
     
       rml:subjectMap [
    -    rml:template "seq/{id}" ;
    +    rml:template "seq{id}" ;
         rml:gather ( [ rml:reference "values.*" ; ] ) ;
         rml:gatherAs rdf:Seq ;  
       ] ;
    @@ -877,14 +931,14 @@ 

    RML-CC: Collections and Containers in RML

    ] .

    The expected result is:

    -
    :seq/a rdf:_1 "1" ; rdf:_2 "2" ; rdf:_3 "3" .
    -:seq/a prov:wasDerivedFrom <data.json> .
    +
    :seqa rdf:_1 "1" ; rdf:_2 "2" ; rdf:_3 "3" .
    +:seqa prov:wasDerivedFrom <data.json> .
     
    -:seq/b rdf:_1 "4" ; rdf:_2 "5" ; rdf:_3 "6" .
    -:seq/b prov:wasDerivedFrom <data.json> .
    +:seqb rdf:_1 "4" ; rdf:_2 "5" ; rdf:_3 "6" .
    +:seqb prov:wasDerivedFrom <data.json> .
     
    -:seq/c rdf:_1 "7" ; rdf:_2 "8" ; rdf:_3 "9" .
    -:seq/c prov:wasDerivedFrom <data.json> .
    +:seqc rdf:_1 "7" ; rdf:_2 "8" ; rdf:_3 "9" . +:seqc prov:wasDerivedFrom <data.json> .