Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of language- and data-type maps. #145

Open
chrdebru opened this issue Nov 21, 2024 · 3 comments
Open

Definition of language- and data-type maps. #145

chrdebru opened this issue Nov 21, 2024 · 3 comments

Comments

@chrdebru
Copy link
Collaborator

Given the following XML for example:

<tasks>
  <task>
    <task_id>1</task_id>
    <description lang="en">Design Mockups</description>
    <description lang="fr">Concevoir des maquettes</description>
  </task>
  <task>
    <task_id>2</task_id>
    <description lang="en">Develop Frontend</description>
    <description lang="fr">Développer le frontend</description>
  </task>
  <task>
    <task_id>3</task_id>
    <description lang="en">Develop Backend</description>
    <description lang="fr">Développer le backend</description>
  </task>
</tasks>

Iterating over /tasks/task, I want to generate English and French labels from tasks.

      rml:predicate ex:description ;
      rml:objectMap [ rml:reference "description[@lang='en']" ; rml:language "en" ; ] ; 
      rml:objectMap [ rml:reference "description[@lang='fr']" ; rml:language "fr" ; ] ; 
  ] ;

Allows me to do that, but why "hard-code" the languages. The problem is that

      rml:predicate ex:description ;
      rml:objectMap [ rml:reference "description" ; rml:languageMap [ rml:reference "description/@lang" ] ] ;
  ] ;

leads to a Cartesian product of labels and languages. This respects the definition of language-maps (and data-maps, by extension): "Given the list of values resulting from a language-taggable term map T, and the list of values resulting from its language map L, the resulting terms are generated by the n-ary Cartesian product combination of T × L, where the values in T are the lexical forms, and the values in L are the non-empty language tags."

Is this something we want (seems contradictory w.r.t. a seemingly conceivable use case). If not, there is (IMHO) something wrong with the specification, and we likely need some iteration manipulation (as @frmichel once suggested). If not, then the spec should give a concrete example with maybe a note or two.

@chrdebru
Copy link
Collaborator Author

Unless I have missed something, went through the spec and the example above respects the definition. @andimou What is your opinion on this?

@pmaria
Copy link
Collaborator

pmaria commented Dec 2, 2024

This is indeed a nice example that illustrates the issues (that were already there also with templates) when working with hierarchical data and trying to combine data elements respecting the hierarchical context of the data source.

See also the description of this problem in the RML-LV spec https://github.com/kg-construct/rml-lv/blob/main/spec/section/problem.md#nested-data-structures

So, one way to solve this problem would be to use logical views.

I am open to discussing other ways to solve this in core.

@chrdebru
Copy link
Collaborator Author

chrdebru commented Dec 2, 2024

Well, to me, LV solves the problem by "eliminating" the inconveniences of hierarchical docs and multi-valued expression maps by creating the logical equivalent of rows where the "scope" of multi-valued expression maps is nicely defined. My question can be rephrased as follows: do we recognize the proposed definition and its implications (in corner cases) and explicitly acknowledge and document such implications? In other words, do not solve it as it can be handled elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants