Figure out what to do about the top-level object in a Phyx file #78

gaurav · 2021-02-22T22:43:42Z

A Phyx file is a JSON-LD file with a top-level object that includes both phylogenies and phyloreferences. This top-level object isn't intended to be used as an RDF object -- we use some dummy RDF properties (http://vocab.phyloref.org/phyloref/testcase.owl#has_phyloreference, http://vocab.phyloref.org/phyloref/testcase.owl#has_phylogeny and http://vocab.phyloref.org/phyloref/testcase.owl#has_node) to link this object to each phyloreference, phylogeny and node. These are only needed for working with the JSON-LD file -- when reading a JSON-LD file as RDF, entities not connected to the root object via RDF properties are ignored. None of them are used in reasoning.

For the purposes of metadata, we do allow the top-level object to have a doi or a citation, indicating the DOI of this Phyx file or a citation to its publication (which are included in the RDF output), as well as a defaultNomenclaturalCodeIRI for the default nomenclatural code to use across all phylogenies and phyloreferences (which is not included in the RDF output). This top-level object can be of any type with any @id -- the Phyx format doesn't specify what it needs to be. Once we fix the bug in PR #77, users will be able to identify this object in any way they like -- and, if no @id is provided, they will be included in the RDF output as an untyped blank node.

If this set-up is fine -- with dummy RDF properties and no recommended @id or @type -- then nothing more needs to be done, and we can close this issue. @hlapp What do you think?

The text was updated successfully, but these errors were encountered:

hlapp · 2021-02-22T23:41:45Z

I'm not sure I understand the repercussions in full. However, taking a step back, if this is an exchange standard then "anything goes" in places that are keys and types is usually a bad idea and inviting trouble down the road. So it seems to me we should give guidance (and requirements) on type and suggested (i.e., optional) keys even for the top-level object.

hlapp · 2021-02-22T23:44:27Z

Also, I think ultimately we want to work towards obsoleting the testcase.owl ontology. If there are properties only needed or working with Phyx (JSON-LD) files, then I don't know why those would have to be in any ontology. Isn't a nice thing about JSON-LD that not every key has be linked to semantics in an ontology?

gaurav · 2021-03-07T23:26:42Z

I'm not sure I understand the repercussions in full. However, taking a step back, if this is an exchange standard then "anything goes" in places that are keys and types is usually a bad idea and inviting trouble down the road. So it seems to me we should give guidance (and requirements) on type and suggested (i.e., optional) keys even for the top-level object.

In thinking about this a bit more, I've realized that the top-level object does exist in the generated OWL/JSON-LD file -- it's the OWL ontology itself! So, we can make it explicit in the Phyx standard that the top-level object should have a @type of owl:Ontology, and can optionally have an @id if the ontology should have a particular IRI. Some top-level properties (e.g. defaultNomenclaturalCodeIRI) will be ignored in RDF, while others (such as the source of the Phyx file) can be converted into annotation properties. I'll go ahead and put that into the manuscript.

Also, I think ultimately we want to work towards obsoleting the testcase.owl ontology. If there are properties only needed or working with Phyx (JSON-LD) files, then I don't know why those would have to be in any ontology.

I agree! We don't actually use any Testcase terms for reasoning any more, so I've created a PR that removes this ontology from the imports (PR #81).

Isn't a nice thing about JSON-LD that not every key has be linked to semantics in an ontology?

Yes, definitely! What concerns me is that these keys still need to be RDF properties, otherwise their values will be ignored when the OWL ontology is generated. I've replaced some of these made-up terms with CDAO terms in PR #81. We still use the following made-up RDF properties, all of which are currently in the testcase: namespace:

Needed to tie the generated OWL/JSON-LD file together:
- testcase:has_phylogeny, in the form <ontology> testcase:has_phylogeny <phylogeny>.
- testcase:has_phyloreference, in the form <ontology> testcase:has_phyloreference <phyloref>.
- testcase:has_component_class, in the form <phyloref> testcase:has_component_class <component class>.
Optional; only needed if we want to keep the verbatim specifiers in the generated OWL/JSON-LD file:
- testcase:internal_specifier, in the form <phyloref> testcase:internal_specifier <internal specifier>.
- testcase:external_specifier, in the form <phyloref> testcase:external_specifier <external specifier>.

I think we should move these out of the testcase: namespace, since we're eliminating that, but then which namespace should we move them into? We could move them into the phyloref: namespace without defining them in the Phyloref Ontology, but that seems confusing. We could define them as annotation properties in the Phyloref Ontology, which might be the best option. Or we could move them to the example: namespace to make it clear that they don't carry any RDF meaning. @hlapp What do you think?

gaurav · 2021-03-10T19:06:48Z

Consensus:

It should be in one of our supported ontologies.
Open an issue for each of these properties on the Phyloref ontology.

gaurav · 2021-03-12T02:31:55Z

I've figured out a possible solution for testcase:has_phylogeny and testcase:has_phyloreference -- we can connect them to the top-level ontology object using rdfs:isDefinedBy, i.e. by saying that <phylogeny> rdfs:isDefinedBy <ontology> and <phyloref> rdfs:isDefinedBy <ontology>. This looks a bit ugly when the generated ontology doesn't have an @id (which is the default), in which case the <ontology> is a blank node, but it otherwise works fine in Protege and can be reasoned over as expected. I've implemented this in PR #86.

If that's not a good way to implement these terms, we should reopen the discussion we previously had about them at phyloref/phyloref-ontology#15 (testcase:has_phylogeny) and phyloref/phyloref-ontology#16 (testcase:has_phyloreference).

I'll open a new issue for phyloref:has_component_class in a bit.

This PR replaces `testcase:has_phyloreference` and `testcase:has_phylogeny` with a reversed [`rdfs:isDefinedBy`](https://www.w3.org/TR/rdf-schema/#ch_isdefinedby), i.e. so each phyloreference and phylogeny is described as being defined by the generated ontology (which may be a blank node if no `@id` is set). This fulfills our need for a way of linking these properties together (see #78) without creating a new term to do so.

While we need this in JSON, we probably don't need to include this in RDF (see #78). If we do need to insert it, we can add it back pretty easily.

gaurav · 2021-03-15T23:14:02Z

Took a while, but I've finally filed that phyloref:has_component_class issue at phyloref/phyloref-ontology#42 -- although, as I argue in that issue, has_part (obo:BFO_0000050) might be a better option.

I've taken out testcase:internal_specifier and testcase:external_specifier in #87 -- once we decide what to do with phyloref:has_component_class, I'll make that change in the same PR and use it to close this issue.

gaurav · 2021-03-18T16:10:44Z

Hilmar and I discussed this this morning, and it looks like there are two possibilities for phyloref:has_component_class:

We make two fields: hasSubClass for non-cache component classes and for cache component classes.
We might be able to insert the objects directly the first time we call it, but then return a reference the second time around.

The rule we should use is that it's okay to let hacks slip into the way that phyx.js works, but not to let that slip into the OWL representation.

This PR removes some RDF properties from the JSON Schema: - While we need `internalSpecifiers` and `externalSpecifiers` properties in JSON, we probably don't need to include them in RDF (see #78, phyloref/phyloref-ontology#7, phyloref/phyloref-ontology#8). If we change our mind later, we can add them back pretty easily. - We previously used the RDF version of the `malformedPhyloreference` field to let JPhyloRef know when a particular phyloreference was malformed (such as when it had no internal specifiers, or if one of its specifiers could not be mapped to a particular phylogeny node). JPhyloRef no longer checks this field at all, so we might as well remove it from here as well. - I've rewritten the component class system so that the two types of component classes are handled separately: - Direct subclasses are now linked to the phyloreference via the `subClasses` field (which is the `@reverse` of `rdfs:subClassOf`). - Cache component classes are now inserted directly into the logical expression when first used, and then subsequently referenced in other logical expressions as needed. - Component classes were previously stored and identified on a global basis (i.e. if `#phyloref1` had 4 component classes, the first component class for `#phyloref2` would be identified as `#phyloref2_component5`). I've changed this so that each phyloreference's component classes start counting from 1.

gaurav mentioned this issue Mar 7, 2021

Remove Testcase ontology #81

Merged

This was referenced Mar 7, 2021

Add apomorphy-based phyloref #72

Merged

Addphyloref:apomorphy as an annotation property for describing an apomorphy phyloref/phyloref-ontology#40

Closed

gaurav mentioned this issue Mar 12, 2021

Use rdfs:isDefinedBy to link phyloreferences and phylogenies to the ontology #86

Merged

gaurav mentioned this issue Mar 15, 2021

An object property for identifying the component classes of a phyloreference phyloref/phyloref-ontology#42

Closed

gaurav added a commit that referenced this issue Mar 15, 2021

Removed testcase:internal_specifier and testcase:external_specifier.

c0233e6

While we need this in JSON, we probably don't need to include this in RDF (see #78). If we do need to insert it, we can add it back pretty easily.

gaurav mentioned this issue Mar 15, 2021

Removed some unnecessary RDF properties #87

Merged

gaurav closed this as completed in #87 Mar 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out what to do about the top-level object in a Phyx file #78

Figure out what to do about the top-level object in a Phyx file #78

gaurav commented Feb 22, 2021

hlapp commented Feb 22, 2021

hlapp commented Feb 22, 2021

gaurav commented Mar 7, 2021

gaurav commented Mar 10, 2021

gaurav commented Mar 12, 2021

gaurav commented Mar 15, 2021

gaurav commented Mar 18, 2021

Figure out what to do about the top-level object in a Phyx file #78

Figure out what to do about the top-level object in a Phyx file #78

Comments

gaurav commented Feb 22, 2021

hlapp commented Feb 22, 2021

hlapp commented Feb 22, 2021

gaurav commented Mar 7, 2021

gaurav commented Mar 10, 2021

gaurav commented Mar 12, 2021

gaurav commented Mar 15, 2021

gaurav commented Mar 18, 2021