Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relation between fields and collections #3

Open
thomas-delva opened this issue Nov 16, 2021 · 2 comments
Open

Relation between fields and collections #3

thomas-delva opened this issue Nov 16, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request working group?

Comments

@thomas-delva
Copy link

Fields and collections both deal with multivalues in the source data, so they should be aligned.

Currently Franck and Christophe define the gather map as a way to generate collections, we should see how it works if fields are used instead of references: https://docs.google.com/presentation/d/1QYSyuzvN4xO3mC6FTja2RLsZS2JCZLqmt53DXE4KyxM/

In the fields paper a group by approach was proposed to generate collections, where field values are grouped by equal values of other fields, we should probably see how this compares with the gathering approach:
image

@thomas-delva thomas-delva added the enhancement New feature or request label Nov 16, 2021
@andimou
Copy link

andimou commented Nov 16, 2021

@thomas-delva could you add the same example as the one used for collection but with fields?

Then it will be easier to compare, but in any case, I think we don't want to have 1 solution but we want them to offer same coverage

@thomas-delva
Copy link
Author

Below is a fields version of the examples in the gathermap slides for easier comparison. rml:gatherBy is used to "un-flatten" the multivalues after fields flatten them.
There are five distinct examples in the slides and I'll cover them in the same order below: "simple example", "relational databases", "nested iteration over source", "generating nested collections", "multiple gather maps".

Simple example

Data:

{ "a": "1",
  "b": [ "1", "2", "3" ] }

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "a_field" ;
    rml:reference "$.a" ] ;
  rml:field [
    rml:name "b_field" ;
    rml:reference "$.b.*" ] .

Intermediate representation:

field_a field_b
1 1
1 2
1 3

Object map:

... objectMap [
  rml:gather ( [ rml:reference "field_b" ] ) ;
  rml:gatherAs rdf:List ;
  rml:gatherBy "it"  # can be implicit: one level higher than field_b
                     # "it" refers to the iterator, i.e., the "field" one level above field_b
  ] .

Output:

... ( "1" "2" "3" )

Relational databases

Input == intermediate representation:

ID TITLE BOOKID SALUTATION FNAME LNAME
1 Frankenstein 1 NULL Mary Shelley
2 The Long Earth 2 Sir Terry Pratchett
3 The Long Earth 2 Null Stephen Baxter

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:field [
    rml:name "bookid_field" ;
    rml:reference "BOOKID" ] ;
  rml:field [
    rml:name "id_field" ;
    rml:reference "ID" ] .

Triples map:

<TM> a rr:TriplesMap ;
  rml:logicalSource <LS> ;
  rr:subjectMap [ rr:template "http://ex.com/book{bookid_field}" ] ;
  rr:predicateObjectMap [
    rr:predicate :writtenBy ;
    objectMap [
      rml:gather ( [ rr:template "http://ex.com/author{id_field}" ] ) ;
      rml:gatherAs rdf:List ;
      rml:gatherBy "bookid_field"
    ] ] .

Output:

:book1 :writtenBy ( :author1 ) .
:book2 :writtenBy ( :author2 :author3 ) .

Nested iteration

Here, fields can be declared once and then used to generate collections from different iteration levels (compare the two predicate-object maps).

Data:

{ "id": "id",
  "a": [ [ "1", "2", "3" ],
         [ "4", "5", "6" ] ] }

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "id_field" ;
    rml:reference "$.id" ] ;
  rml:field [
    rml:name "a_outer_field" ;
    rml:reference "$.a.*" 
    rml:field [
      rml:name "a_inner_field" ;
      rml:reference "$.*" ] ] .

Intermediate representation:

id_field a_outer_field a_inner_field
id [ "1", "2", "3" ] 1
id [ "1", "2", "3" ] 2
id [ "1", "2", "3" ] 3
id [ "4", "5", "6" ] 4
id [ "4", "5", "6" ] 5
id [ "4", "5", "6" ] 6

Triples map:

<TM> a rr:TriplesMap ;
  rml:logicalSource <LS> ;
  rr:subjectMap [ rr:template "http://ex.com/{id_field}" ] ;
  rr:predicateObjectMap [
    rr:predicate :a_values_grouped ;
    objectMap [
      rml:gather ( [ rml:reference "a_inner_field" ] ) ;
      rml:gatherAs rdf:List ;
      rml:gatherBy "a_outer_field"  # can be implicit; one level higher than a_inner_field
    ] ] ;
  rr:predicateObjectMap [
    rr:predicate :a_values_all ;
    objectMap [
      rml:gather ( [ rml:reference "a_inner_field" ] ) ;
      rml:gatherAs rdf:List ;
      rml:gatherBy "it"  # "it" refers to the iterator
    ] ] .

Output:

:id :a_values_grouped ( "1" "2" "3" ), ( "4" "5" "6" ) ;
    :a_values_all ( "1" "2" "3" "4" "5" "6" ) .

Nested gather maps

Data (same as previous):

{ "id": "id",
  "a": [ [ "1", "2", "3" ],
         [ "4", "5", "6" ] ] }

Logical source + fields (same as previous):

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "id_field" ;
    rml:reference "$.id" ] ;
  rml:field [
    rml:name "a_outer_field" ;
    rml:reference "$.a.*" 
    rml:field [
      rml:name "a_inner_field" ;
      rml:reference "$.*" ] ] .

Intermediate representation (same as previous):

id_field a_outer_field a_inner_field
id [ "1", "2", "3" ] 1
id [ "1", "2", "3" ] 2
id [ "1", "2", "3" ] 3
id [ "4", "5", "6" ] 4
id [ "4", "5", "6" ] 5
id [ "4", "5", "6" ] 6

Object map:

... rr:objectMap [
  rr:termType rr:BlankNode ;
  rml:gather ([
    rr:termType rr:BlankNode ;
    rml:gather ( [ rml:reference "a_inner_field" ] ) ;
    rml:gatherAs rdf:List ;
    rml:gatherBy "a_outer_field" # can be implicit: one level higher than a_inner_field
  ]) ;
  rml:gatherAs rdf:List;
  rml:gatherBy "it" # can be implicit: one level higher than a_outer_field
                    # "it" refers to the iterator, i.e., the "field" one level above a_outer_field
  ] ;

Output:

( ( "1" "2" "3" ) ( "4" "5" "6" ) )

Multiple term maps in gather map

Data:

{ "a": "1", 
  "b": [ "1", "2", "3" ],
  "c": [ "4", "5", "6" ] }

Logical source + fields:

<LS> a rml:LogicalSource ;
  rml:iterator "$" ;
  rml:field [
    rml:name "a_field" ;
    rml:reference "$.a" ] ;
  rml:field [
    rml:name "b_field" ;
    rml:reference "$.b.*" ]
  rml:field [
    rml:name "c_field" ;
    rml:reference "$.c.*" ] .

Intermediate representation:

field_a field_b field_c
1 1 4
1 1 5
1 1 6
1 2 4
1 2 5
1 2 6
1 3 4
1 3 5
1 3 6

Object map:

... objectMap [
  rml:gather ( [ rml:reference "field_b" ] [ rml:reference "field_c" ] ) ;
  rml:gatherAs rdf:List ;
  rml:gatherBy "it"  # can be implicit: one level higher than field_b
                     # "it" refers to the iterator, i.e., the "field" one level above field_b
  rml:strategy rml:Append ; # default strategy 
  ] .

Output:

... ( "1" "2" "3" "4" "5" "6" )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request working group?
Projects
None yet
Development

No branches or pull requests

4 participants