Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce provided source (#91) #95

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions spec/section/source-vocabulary.md
Original file line number Diff line number Diff line change
Expand Up @@ -456,3 +456,30 @@ specified by a string Literal. The file's absolute path is `/root/file.xml`.
rml:iterator "$";
.
</pre>

In some cases, it is useful to describe access to a source that will be programmatically
provided to an RML processor at runtime.
This could, for example, be a byte stream or an already deserialized JSON or XML node.

For these cases, the `rml:ProvidedSource` can be used.
It is a sub class of `rml:Source` and MUST have:

* exactly one `rml:sourceIdentifier` property, whose value is a string literal, which
uniquely identifies the source.

| Property | Domain | Range |
| ------------------------| --------------------- | ---------- |
| `rml:sourceIdentifier` | `rml:ProvidedSource` | `Literal` |

The following example illustrates a provided source with its unique identifier `myByteStream`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need a reference formulation? How does the processor know how to deal with that kind of ProvidedSource?
So an iterator + reference formulation?

I guess the actual dealing is then defined in rml-io-registry once we have an iterator + reference formulation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. The idea is that the reference formulation can be any reference formulation supported by the RML processor. It is up to the mapping author to choose the correct one that will match the provided source at runtime. So this addition has no impact on that. I agree that this is not very clear from the text and the example. I will clarify.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah any formulation is fine, but we need to clarify that somehow that you probably need one and if not provided, we fallback to the default of a Logical Source.

Copy link
Collaborator Author

@pmaria pmaria Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. What is the "default [reference formulation] of a Logical Source"?

What I mean is:
say you have processed some data in some runtime process. The idea is to be able to provide that to an RML processor during that runtime. The type of data that is provided should match one of the reference formulations supported by the RML processor. The type of reference formulation for this specific source is determined, as per usual, by the rml:referenceFormulation of the Logical Source. All we need is to know which data matches which provided source in the mapping. That's why we need a rml:sourceIdentifier. This identifier will also be provided to the RML processor together with the source data programmatically.

So say you process that data to some structure on which the engine can execute JSONPath queries. Then a mapping could look like:

<> rml:logicalSource [
  rml:source [
    a rml:ProvidedSource ;
    rml:sourceIdentifier "my-json-source" ;
  ] ;
  rml:iterator "$" ;
  rml:referenceFormulation rml:JSONPath ;
] ;
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Looks what I have mind!

I'm not sure I follow. What is the "default [reference formulation] of a Logical Source"?

I was wrong, that's old RML. RML-Core now requires a reference formulation:

exactly one rml:referenceFormulation property
https://kg-construct.github.io/rml-core/spec/docs/#logicalSource

Old RML had a default 'tabular' for this to be compatible with R2RML.


<pre class="ex-source">
&lt;#ProvidedSource&gt; a rml:LogicalSource;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it?!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The '#ProvidedSource' name I mean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BindingSource is suggested.

rml:source [ a rml:ProvidedSource;
rml:sourceIdentifier "myByteStream";
];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add reference formulation here

.
</pre>

An RML processor MAY offer a mechanism to programmatically provide a data source corresponding to
the `rml:ProvidedSource` description.
Loading