Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce iterable and describe natural mappings #144

Merged
merged 12 commits into from
Jan 18, 2025
150 changes: 150 additions & 0 deletions spec/docs/datatypeConversion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Datatype conversions

For each [=reference formulation=] there may be a set of defined <dfn data-lt="natural mapping">natural RDF mappings</dfn> that are applied to the [=expression evaluation results=] on the [=data source=]. These [=natural mappings=] are defined in the [[RML-IO-Registry]] and are used to convert the values of the [=expression evaluation result=] to the appropriate [=natural RDF literal=] corresponding with the [=reference formulation=].

## Natural mapping of source values

The <dfn>natural RDF literal</dfn> is a [=literal=] that is the result of applying a [=natural mapping=] on a value of a [=data source=], which produces a [=literal=] that is the most appropriate representation of the value in RDF. The [=natural RDF literal=] has a [=natural RDF lexical form=].

The <dfn>natural RDF lexical form</dfn> produces only the [=lexical form=] of the [=literal=] and recommends that implementations SHOULD apply the [=XSD canonical mapping=], making it a [=canonical RDF lexical form=]. It is used in RML when non-string [=expression evaluation results=] are used in a string context, for example when a timestamp is used in an [=template-valued term map=] with [=term type=] [=IRI=].
pmaria marked this conversation as resolved.
Show resolved Hide resolved

The <dfn>canonical RDF lexical form</dfn> produces only the [=lexical form=] of the [=literal=] and requires that the [=XSD canonical mapping=] MUST be applied.

<dfn>Cast to string</dfn> is an implementation-dependent function that maps values from [=expression evaluation results=] to equivalent Unicode strings. The specifics of [=cast to string=] per [=reference formulation=] are defined in the [[RML-IO-Registry]].

Additionally, the [=natural mapping=] determines the [=natural RDF datatype=] of the [=literal=].

The <dfn>natural RDF datatype</dfn> is the [=datatype=] corresponding to the [=natural RDF literal=] that is the result of the [=natural mapping=]. The [=natural RDF datatype=] is an [=IRI=] that represents the [=datatype=] of the value in RDF.

## Datatype-override mapping of source values

The <dfn>datatype-override RDF literal</dfn> corresponding to an [=expression evaluation result=] value `v` and a [=datatype IRI=] `dt`, is a [=literal=] whose [=lexical form=] is the [=natural RDF lexical form=] corresponding to `v`, and whose [=datatype IRI=] is `dt`. If the [=literal=] is [=ill-typed=], then a [=data error=] is raised.

A [=literal=] is <dfn data-lt="ill-typed literal">ill-typed</dfn> in RML if its [=datatype IRI=] denotes a [=validatable RDF datatype=] and its [=lexical form=] is not in the [=lexical space=] of the [=RDF datatype=] identified by its [=datatype IRI=].

The set of <dfn>validatable RDF datatypes</dfn> includes all [=datatypes=] in the RDF datatype column of [[[#table-lexical-forms]]], as defined in [[XMLSCHEMA11-2]]. This set MAY include implementation-defined additional RDF datatypes.

For example, `"X"^^xsd:boolean` is [=ill-typed=] because `xsd:boolean` is a validatable [=RDF datatype=] in RML, and `"X"` is not in the [=lexical space=] of `xsd:boolean` [[XMLSCHEMA11-2]].

<section class="informative">
<h2>Summary of XSD Lexical Forms</h2>

The [=natural mappings=] make reference to various [=XSD datatypes=] and require that values from [=expression evaluation results=] be converted to strings that are appropriate as [=lexical forms=] for these [=datatypes=]. This subsection gives examples of these [=lexical forms=] in order to aid implementers of the mappings. This subsection is non-normative; the normative definitions of the [=lexical spaces=] as well as the [=canonical mappings=] are found in [[XMLSCHEMA11-2]].

A general approach that may be used for implementing the natural mappings is as follows:

1. Identify the source datatype of value of the [=expression evaluation result=] on the [=data source=].
1. Look up its corresponding [=natural RDF datatype=] for the [=reference formulation=] in the [[RML-IO-Registry]].
1. Apply [=cast to string=] to the value.
1. Ensure that the resulting string is in the [=lexical space=] of the target [=RDF datatype=]; that is, it must be in a form such as those listed in either column of [[[#table-lexical-forms]]] below. This may require some transformations of the string, in particular for `xsd:hexBinary`, `xsd:dateTime` and `xsd:boolean`.
1. If the goal is to obtain a [=canonical RDF lexical form=], then further string transformations may be required to obtain a form such as those listed in the Canonical lexical forms column of [[[#table-lexical-forms]]] below.

<table class="numbered" id="table-lexical-forms">
<caption>Table of canonical and non-canonical lexical forms for some XSD datatypes</caption>
<tbody>
<tr>
<th>RDF datatype</th>
<th>Non-canonical lexical forms</th>
<th>Canonical lexical forms</th>
<th>Comments</th>
</tr>
<tr>
<td><code><a href="https://www.w3.org/TR/xmlschema11-2/#hexBinary">xsd:hexBinary</a></code></td>
<td><code>5232524d4c</code></td>
<td><code>5232524D4C</code></td>
<td>Convert from SQL by applying <a href="https://www.w3.org/TR/xmlschema11-2/#hexBinary"><code>xsd:hexBinary</code> lexical mapping</a>.</td>
</tr>
<tr>
<td rowspan="4"><code><a href="https://www.w3.org/TR/xmlschema11-2/#decimal">xsd:decimal</a></code></td>
<td><code>.224</code></td>
<td><code>0.224</code></td>
<td rowspan="4"></td>
</tr>
<tr>
<td><code>+001</code></td>
<td><code>1</code></td>
</tr>
<tr>
<td><code>42.0</code></td>
<td><code>42</code></td>
</tr>
<tr>
<td><code>-5.9000</code></td>
<td><code>-5.9</code></td>
</tr>
<tr>
<td rowspan="3"><code><a href="https://www.w3.org/TR/xmlschema11-2/#integer">xsd:integer</a></code></td>
<td><code>-05</code></td>
<td><code>-5</code></td>
<td rowspan="3"></td>
</tr>
<tr>
<td><code>+333</code></td>
<td><code>333</code></td>
</tr>
<tr>
<td><code>00</code></td>
<td><code>0</code></td>
</tr>
<tr>
<td rowspan="5"><code><a href="https://www.w3.org/TR/xmlschema11-2/#double">xsd:double</a></code></td>
<td><code>-5.90</code></td>
<td><code>-5.9E0</code></td>
<td rowspan="5">Also supports <code>INF</code>, <code>-INF</code>, <code>NaN</code> and <code>-0.0E0</code>,<br>but these do not appear in standard SQL.</td>
</tr>
<tr>
<td><code>+0.00014770215000</code></td>
<td><code>1.4770215E-4</code></td>
</tr>
<tr>
<td><code>+01E+3</code></td>
<td><code>1.0E3</code></td>
</tr>
<tr>
<td><code>100.0</code></td>
<td><code>1.0E2</code></td>
</tr>
<tr>
<td><code>0</code></td>
<td><code>0.0E0</code></td>
</tr>
<tr>
<td rowspan="2"><code><a href="https://www.w3.org/TR/xmlschema11-2/#boolean">xsd:boolean</a></code></td>
<td><code>1</code></td>
<td><code>true</code></td>
<td rowspan="2">Must be lowercase.</td>
</tr>
<tr>
<td><code>0</code></td>
<td><code>false</code></td>
</tr>
<tr>
<td><code><a href="https://www.w3.org/TR/xmlschema11-2/#date">xsd:date</a></code></td>
<td></td>
<td><code>2011-08-23</code></td>
<td>Dates in SQL don't have timezone offsets.<br>They are optional in XSD.</td>
</tr>
<tr>
<td rowspan="3"><code><a href="https://www.w3.org/TR/xmlschema11-2/#time">xsd:time</a></code></td>
<td><code>22:17:34.885+00:00</code></td>
<td><code>22:17:34.885Z</code></td>
<td rowspan="3">May or may not have timezone offset.</td>
</tr>
<tr>
<td><code>22:17:34.000</code></td>
<td><code>22:17:34</code></td>
</tr>
<tr>
<td><code>22:17:34.1+01:00</code></td>
<td><code>22:17:34.1+01:00</code></td>
</tr>
<tr>
<td><code><a href="https://www.w3.org/TR/xmlschema11-2/#dateTime">xsd:dateTime</a></code></td>
<td><code>2011-08-23T22:17:00.000+00:00</code></td>
<td><code>2011-08-23T22:17:00Z</code></td>
<td>May or may not have timezone offset.<br>Convert from SQL by replacing space with "<code>T</code>".</td>
</tr>
</tbody>
</table>

</section>
11 changes: 5 additions & 6 deletions spec/docs/expressions.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Generating values with expressions

<dfn>Expressions</dfn> are mapping constructs that can be evaluated on a [=logical iteration=], according to the specified reference formulation, to generate values during the mapping process.
<dfn>Expressions</dfn> are mapping constructs that can be evaluated on a [=logical iteration=], according to the specified [=reference formulation=], to generate values during the mapping process.

## Expression map (`rml:ExpressionMap`)

Expand All @@ -10,17 +10,17 @@ An <dfn>expression map</dfn> (`rml:ExpressionMap`) is an abstract class, that is
* 0 or 1 `rml:template`, or
* another property, or properties, defined by a subclass of `rml:ExpressionMap`.

Each of these properties specifies an [=expression=] which, upon evaluation, results in an ordered list of values.
Each of these properties specifies an [=expression=] which, upon evaluation, results in an ordered list of values, called the <dfn>expression evaluation result</dfn>.

The <dfn>reference expression set</dfn> of an [=expression map=] is the set of expressions which are evaluated on a [=logical iteration=].

### Constant expression (`rml:constant`)

A <dfn>constant-valued expression map</dfn> is an [=expression map=] that always generates the same value. A constant-valued expression map is represented by a resource that has exactly one `rml:constant` property, the value of which is called a <dfn>constant expression</dfn>.
A <dfn>constant-valued expression map</dfn> is an [=expression map=] that always generates the same [=expression evaluation result=]. A constant-valued expression map is represented by a resource that has exactly one `rml:constant` property, the value of which is called a <dfn>constant expression</dfn>.

The <dfn>constant value</dfn> is a singleton list containing the [=constant expression=].

The [=reference expressions=] of a [constant-valued expression map=] is an empty list.
The [=reference expressions=] of a [=constant-valued expression map=] is an empty list.

### Reference (`rml:reference`)
A <dfn>reference-valued expression map</dfn> is an [=expression map=] that is represented by a resource that has exactly one `rml:reference` property, the value of which is called a <dfn>reference expression</dfn>.
Expand All @@ -29,8 +29,7 @@ The [=reference expression=] MUST be a valid [=expression=] according to the def

The [=reference expression set=] of a [=reference-valued expression map=] is the singleton set containing the [=reference expression=].

The <dfn>reference value</dfn> is an ordered list of values obtained by evaluating the [=reference expression=] against a given [=logical iteration=].
For each value in the ordered list, an expression is created.
The <dfn>reference value</dfn> is the [=expression evaluation result=] obtained by evaluating the [=reference expression=] against a given [=logical iteration=].

### Template (`rml:template`)
A <dfn>template-valued expression map</dfn> is an [=expression map=] that is represented by a resource that has exactly one `rml:template` property, the value of which is called a <dfn>template expression</dfn>. The [=template expression=] MUST be a valid [=string template=].
Expand Down
6 changes: 5 additions & 1 deletion spec/docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,13 @@

<section id="joins" data-include="joinconditions.md" data-include-format="markdown"></section>

<section id="datatype-conversion" data-include="datatypeConversion.md" data-include-format="markdown"></section>

<section id="definitions" data-include="definitions.md" data-include-format="markdown"></section>

<section id="rdfTerminology" class="appendix, informative" data-include="rdfTerminology.md" data-include-format="markdown"></section>
<section id="rdfTerminology" class="informative" data-include="rdfTerminology.md" data-include-format="markdown"></section>

<section id="xsdTerminology" class="informative" data-include="xsdTerminology.md" data-include-format="markdown"></section>

</body>

Expand Down
12 changes: 7 additions & 5 deletions spec/docs/logicalSource.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# Defining Logical Sources
# Defining Logical Iterables and Logical Sources

A <dfn>logical source</dfn> is an abstract construct to describe data access and iteration for a [=data source=] such that it can be mapped to [=RDF triples=].
A <dfn>logical iterable</dfn> is an abstract construct to describe data access and iteration for a [=data source=].

A [=logical source=] (`rml:LogicalSource`) MUST have:
A [=logical iterable=] (`rml:LogicalIterable`) MUST have:
* exactly one `rml:referenceFormulation` property, whose value is a <dfn>reference formulation</dfn> which defines how the underlying [=data source=] is to be accessed, and which [=expressions=] can be evaluated on [=logical iterations=],
* zero or one `rml:iterator` property, whose value is a <dfn data-lt="iterator">logical iterator</dfn> that defines a sequence of [=logical iterations=] on the [=data source=]. If no [=iterator=] is provided, a <dfn class="lint-ignore">default iterator</dfn> MUST be associated with the [=reference formulation=].
pmaria marked this conversation as resolved.
Show resolved Hide resolved

A <dfn data-lt="iteration">logical iteration</dfn> is an item in the sequence produced by the [=logical source=], on which [=expressions=] can be evaluated.
A <dfn data-lt="iteration">logical iteration</dfn> is an item in the sequence produced by the [=logical iterable=], on which [=expressions=] can be evaluated.

A <dfn>data source</dfn> is an abstract concept that represents a source of data that can be accessed via a [=logical source=]. A [=data source=] can be a file, a database, a web service, or any other source of data.
A <dfn>data source</dfn> is an abstract concept that represents a source of data that can be accessed via a [=logical iterable=]. A [=data source=] can be a file, a database, a web service, or any other source of data, depending on the type of [=logical iterable=].

<aside class="note">
There can be many different types of [=reference formulation=]. The known types, and the details of how a reference formulation is handled and implemented for each data format, are specified in [[RML-IO-Registry]].
</aside>

A <dfn>logical source</dfn> (`rml:LogicalSource`) is a sub class of [=logical iterable=] that can be associated with a [=triples map=] such that a [=data source=] can be mapped to [=RDF triples=].
2 changes: 1 addition & 1 deletion spec/docs/mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ All [=RDF triples=] generated from one [=logical iteration=] in the [=logical so

A [=triples map=] is represented by a [=resource=] that references the following other [=resources=]:

* It MUST have zero or one [=logical source=] (`rml:logicalSource`) property.
* It MUST have zero or one [=logical source=] (`rml:logicalSource`) property whose value MUST be a [=logical source=] (`rml:LogicalSource`).
* It MUST have exactly one [=subject map=] (`rml:SubjectMap`) that specifies how to generate a subject for each [=iteration=] of the [=logical source=].
It may be specified in two ways:
1. using the subject map `rml:subjectMap` property, whose value MUST be the [=subject map=], or
Expand Down
7 changes: 3 additions & 4 deletions spec/docs/rdfTerminology.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,20 @@
# RDF Terminology

This appendix lists some terms normatively defined in other specifications.

The following terms are defined in [[RDF11-CONCEPTS]] and usedin RML:
This section lists some terms normatively defined in [[RDF11-CONCEPTS]] and used in RML:

- <dfn><a data-cite="RDF11-CONCEPTS#dfn-rdf-dataset">RDF dataset</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-rdf-graph">RDF graph</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-rdf-triple">RDF triple</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-iri">IRI</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-blank-node">blank node</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-blank-node-identifier">blank node identifier</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-datatype">datatype</a></dfn>
- <dfn data-lt="RDF datatype"><a data-cite="RDF11-CONCEPTS#dfn-datatype">datatype</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-datatype-iri">datatype IRI</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-default-graph">default graph</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-language-tag">language tag</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-language-tagged-string">language-tagged string</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-lexical-form">lexical form</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-lexical-space">lexical space</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-literal">literal</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-named-graph">named graph</a></dfn>
- <dfn><a data-cite="RDF11-CONCEPTS#dfn-object">object</a></dfn>
Expand Down
Loading
Loading