Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out what to do about the top-level object in a Phyx file #78

Closed
gaurav opened this issue Feb 22, 2021 · 7 comments · Fixed by #87
Closed

Figure out what to do about the top-level object in a Phyx file #78

gaurav opened this issue Feb 22, 2021 · 7 comments · Fixed by #87

Comments

@gaurav
Copy link
Member

gaurav commented Feb 22, 2021

A Phyx file is a JSON-LD file with a top-level object that includes both phylogenies and phyloreferences. This top-level object isn't intended to be used as an RDF object -- we use some dummy RDF properties (http://vocab.phyloref.org/phyloref/testcase.owl#has_phyloreference, http://vocab.phyloref.org/phyloref/testcase.owl#has_phylogeny and http://vocab.phyloref.org/phyloref/testcase.owl#has_node) to link this object to each phyloreference, phylogeny and node. These are only needed for working with the JSON-LD file -- when reading a JSON-LD file as RDF, entities not connected to the root object via RDF properties are ignored. None of them are used in reasoning.

For the purposes of metadata, we do allow the top-level object to have a doi or a citation, indicating the DOI of this Phyx file or a citation to its publication (which are included in the RDF output), as well as a defaultNomenclaturalCodeIRI for the default nomenclatural code to use across all phylogenies and phyloreferences (which is not included in the RDF output). This top-level object can be of any type with any @id -- the Phyx format doesn't specify what it needs to be. Once we fix the bug in PR #77, users will be able to identify this object in any way they like -- and, if no @id is provided, they will be included in the RDF output as an untyped blank node.

If this set-up is fine -- with dummy RDF properties and no recommended @id or @type -- then nothing more needs to be done, and we can close this issue. @hlapp What do you think?

@hlapp
Copy link
Member

hlapp commented Feb 22, 2021

I'm not sure I understand the repercussions in full. However, taking a step back, if this is an exchange standard then "anything goes" in places that are keys and types is usually a bad idea and inviting trouble down the road. So it seems to me we should give guidance (and requirements) on type and suggested (i.e., optional) keys even for the top-level object.

@hlapp
Copy link
Member

hlapp commented Feb 22, 2021

Also, I think ultimately we want to work towards obsoleting the testcase.owl ontology. If there are properties only needed or working with Phyx (JSON-LD) files, then I don't know why those would have to be in any ontology. Isn't a nice thing about JSON-LD that not every key has be linked to semantics in an ontology?

@gaurav
Copy link
Member Author

gaurav commented Mar 7, 2021

I'm not sure I understand the repercussions in full. However, taking a step back, if this is an exchange standard then "anything goes" in places that are keys and types is usually a bad idea and inviting trouble down the road. So it seems to me we should give guidance (and requirements) on type and suggested (i.e., optional) keys even for the top-level object.

In thinking about this a bit more, I've realized that the top-level object does exist in the generated OWL/JSON-LD file -- it's the OWL ontology itself! So, we can make it explicit in the Phyx standard that the top-level object should have a @type of owl:Ontology, and can optionally have an @id if the ontology should have a particular IRI. Some top-level properties (e.g. defaultNomenclaturalCodeIRI) will be ignored in RDF, while others (such as the source of the Phyx file) can be converted into annotation properties. I'll go ahead and put that into the manuscript.

Also, I think ultimately we want to work towards obsoleting the testcase.owl ontology. If there are properties only needed or working with Phyx (JSON-LD) files, then I don't know why those would have to be in any ontology.

I agree! We don't actually use any Testcase terms for reasoning any more, so I've created a PR that removes this ontology from the imports (PR #81).

Isn't a nice thing about JSON-LD that not every key has be linked to semantics in an ontology?

Yes, definitely! What concerns me is that these keys still need to be RDF properties, otherwise their values will be ignored when the OWL ontology is generated. I've replaced some of these made-up terms with CDAO terms in PR #81. We still use the following made-up RDF properties, all of which are currently in the testcase: namespace:

  • Needed to tie the generated OWL/JSON-LD file together:

    • testcase:has_phylogeny, in the form <ontology> testcase:has_phylogeny <phylogeny>.
    • testcase:has_phyloreference, in the form <ontology> testcase:has_phyloreference <phyloref>.
    • testcase:has_component_class, in the form <phyloref> testcase:has_component_class <component class>.
  • Optional; only needed if we want to keep the verbatim specifiers in the generated OWL/JSON-LD file:

    • testcase:internal_specifier, in the form <phyloref> testcase:internal_specifier <internal specifier>.
    • testcase:external_specifier, in the form <phyloref> testcase:external_specifier <external specifier>.

I think we should move these out of the testcase: namespace, since we're eliminating that, but then which namespace should we move them into? We could move them into the phyloref: namespace without defining them in the Phyloref Ontology, but that seems confusing. We could define them as annotation properties in the Phyloref Ontology, which might be the best option. Or we could move them to the example: namespace to make it clear that they don't carry any RDF meaning. @hlapp What do you think?

@gaurav
Copy link
Member Author

gaurav commented Mar 10, 2021

Consensus:

  • It should be in one of our supported ontologies.
  • Open an issue for each of these properties on the Phyloref ontology.

@gaurav
Copy link
Member Author

gaurav commented Mar 12, 2021

I've figured out a possible solution for testcase:has_phylogeny and testcase:has_phyloreference -- we can connect them to the top-level ontology object using rdfs:isDefinedBy, i.e. by saying that <phylogeny> rdfs:isDefinedBy <ontology> and <phyloref> rdfs:isDefinedBy <ontology>. This looks a bit ugly when the generated ontology doesn't have an @id (which is the default), in which case the <ontology> is a blank node, but it otherwise works fine in Protege and can be reasoned over as expected. I've implemented this in PR #86.

If that's not a good way to implement these terms, we should reopen the discussion we previously had about them at phyloref/phyloref-ontology#15 (testcase:has_phylogeny) and phyloref/phyloref-ontology#16 (testcase:has_phyloreference).

I'll open a new issue for phyloref:has_component_class in a bit.

gaurav added a commit that referenced this issue Mar 15, 2021
This PR replaces `testcase:has_phyloreference` and `testcase:has_phylogeny` with a reversed [`rdfs:isDefinedBy`](https://www.w3.org/TR/rdf-schema/#ch_isdefinedby), i.e. so each phyloreference and phylogeny is described as being defined by the generated ontology (which may be a blank node if no `@id` is set). This fulfills our need for a way of linking these properties together (see #78) without creating a new term to do so.
gaurav added a commit that referenced this issue Mar 15, 2021
While we need this in JSON, we probably don't need to include this in 
RDF (see #78). If we do need 
to insert it, we can add it back pretty easily.
@gaurav
Copy link
Member Author

gaurav commented Mar 15, 2021

Took a while, but I've finally filed that phyloref:has_component_class issue at phyloref/phyloref-ontology#42 -- although, as I argue in that issue, has_part (obo:BFO_0000050) might be a better option.

I've taken out testcase:internal_specifier and testcase:external_specifier in #87 -- once we decide what to do with phyloref:has_component_class, I'll make that change in the same PR and use it to close this issue.

@gaurav
Copy link
Member Author

gaurav commented Mar 18, 2021

Hilmar and I discussed this this morning, and it looks like there are two possibilities for phyloref:has_component_class:

  • We make two fields: hasSubClass for non-cache component classes and for cache component classes.
  • We might be able to insert the objects directly the first time we call it, but then return a reference the second time around.

The rule we should use is that it's okay to let hacks slip into the way that phyx.js works, but not to let that slip into the OWL representation.

gaurav added a commit that referenced this issue Mar 18, 2021
This PR removes some RDF properties from the JSON Schema:
- While we need `internalSpecifiers` and `externalSpecifiers` properties in JSON, we probably don't need to include them in RDF (see #78, phyloref/phyloref-ontology#7, phyloref/phyloref-ontology#8). If we change our mind later, we can add them back pretty easily.
- We previously used the RDF version of the `malformedPhyloreference` field to let JPhyloRef know when a particular phyloreference was malformed (such as when it had no internal specifiers, or if one of its specifiers could not be mapped to a particular phylogeny node). JPhyloRef no longer checks this field at all, so we might as well remove it from here as well.
- I've rewritten the component class system so that the two types of component classes are handled separately:
  - Direct subclasses are now linked to the phyloreference via the `subClasses` field (which is the `@reverse` of `rdfs:subClassOf`).
  - Cache component classes are now inserted directly into the logical expression when first used, and then subsequently referenced in other logical expressions as needed.
  - Component classes were previously stored and identified on a global basis (i.e. if `#phyloref1` had 4 component classes, the first component class for `#phyloref2` would be identified as `#phyloref2_component5`). I've changed this so that each phyloreference's component classes start counting from 1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants