Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plazi<>GloBI<>Zenodo add two annotated examples: one simple one complex #1

Open
jhpoelen opened this issue Apr 24, 2020 · 18 comments
Open

Comments

@jhpoelen
Copy link
Member

As outlined in https://docs.google.com/document/d/1cKcQfx8X8uAXR6JF96jqZO8OCpwwkYatnxSIyVXvYbo/edit# , we'd like to start using hexastore-like method to encode triples in key-value Zenodo annotations. Suggest to pick two examples, one simple, one complex and add them to Zenodo for GloBI to index.

fyi - @mguidoti @slint

@jhpoelen
Copy link
Member Author

@slint shared a first pass at encoding host associations in Zenodo meta-data via https://sandbox.zenodo.org/record/621971 .

fyi @myrmoteras

Screenshot from 2020-06-11 10-59-22

@jhpoelen
Copy link
Member Author

jhpoelen commented Jun 11, 2020

The text Severe acute respiratory syndrome coronavirus links to https://sandbox.zenodo.org/search?custom=%5Bobo%3ARO_0002453%5D%3A%5B%3ASevere+acute+respiratory+syndrome+coronavirus%5D which encodes the verb [ro:RO_0002453] host of to object Severe acute respiratory syndrome coronavirus. Having the virus be the object of the verb "(is) host of" does make sense, because the sentence "X host of virus Y" makes biological sense.

However, the text Rhinophus links to search?custom=%5Bobo%3ARO_0002453%5D%3A%5BRhinolophus%3A%5D], which encodes the verb [obo:RO_0002453] (or host of) and object Rhinolophus .

This does not makes (biological) sense, because the Rhinolophus is expected to be the subject of the verb (is) host of.

Instead, I'd expect encodings like:

  1. Rhinolophus : obo:RO_0002453 | (is) host of (e.g., bat is a host of virus X where bat is the subject)
    or
  2. obo:RO_0002454 | has host : Rhinolophus (e.g., virus X has host bat, where bat is the object)

where RO_0002454 (has host) is the inverse of RO_0002453 (host of).

@slint curious to hear your thoughts on how to ensure the directionality of the species interactions can be preserved in the Zenodo meta-data.

@slint
Copy link

slint commented Jun 11, 2020

Basically the value of the custom= querystring field has the format [<verb>]:[<subject>:<object>] (in more simplistic terms that could be interpreted as [<verb>]:[<left>:<right>]).

  • The <subject> and <object> parts accept Elasticsearch's query string syntax, which allows boolean logic queries (e.g. ("Rhinolophus pussilus" OR "Rhinolophus macrotis")), wildcard queries (e.g. Rhinolophus*), fuzzy queries (e.g. SARS~3), etc.
  • Skipping any of the <subject> or <object> parts is equivalent to having a "wildcard" (*).
  • Indeed the format is a bit non-traditional in terms representation/syntax, but for now that's just an implementation detail and we can improve on it in future iterations.
  • In the search examples, note the position of the colon, separating the subject side from the object side.

In order to ensure the directionality we have to either:

  • Allow only one direction of the relationship to be possible to submit (i.e. obo:RO_0002453 "host of").
  • Allow both encodings to be submitted, but for searching purposes in our backend, index both directions (to allow more flexibility in querying).
    • This introduces some extra implementation effort on our side which we could though schedule for later, and could also be considered as an additional "feature" on top of the base functionality.

@jhpoelen
Copy link
Member Author

jhpoelen commented Jun 11, 2020

@slint Thanks for clarifying . I missed the : subject/object delimiter in my first take of your work.

Now, with your help, I see that the current relations make sense:

[<obo:RO_0002453>] : [<Rhinolophus> : <empty> ] 

and

[<obo:RO_0002453>] : [ <empty> : <Severe acute respiratory syndrome coronavirus> ] 

Very neat!

@jhpoelen
Copy link
Member Author

Examples on for finding Zenodo publications with annotated interaction terms:

https://gist.github.com/slint/c9e6764dd49475cf619de5f1aece4cbd
https://sandbox.zenodo.org/api/records/?custom=%5Bobo%3ARO_0002453%5D%3A%5B%3A%5D

And the structure of the metadata in json:

https://sandbox.zenodo.org/api/records/621971

@jhpoelen
Copy link
Member Author

the project used by Plazi to upload annotations to Pensoft (including bibtex -> Zenodo annotations soon)

https://github.com/plazi/lycophron

@jhpoelen
Copy link
Member Author

jhpoelen commented Jul 15, 2020

@slint @mguidoti @myrmoteras I've included a summary of a first pass at GloBI indexing Zenodo metadata with biotic interaction associations for your review. Note the permutations of the various combination of subj/verb/obj combinations described in https://sandbox.zenodo.org/record/621971 below.

One question that came up for me: how do you imagine to capture the doi of the annotated original publication? Right now, I am using the Zenodo doi.

sourceTaxonName interactionTypeId interactionTypeName targetTaxonName referenceDoi referenceUrl referenceCitation citation
Rhinolophus http://purl.obolibrary.org/obo/RO_0002453 hostOf Severe acute respiratory syndrome coronavirus 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus http://purl.obolibrary.org/obo/RO_0002453 hostOf SARS-CoV 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
horseshoe bats http://purl.obolibrary.org/obo/RO_0002453 hostOf Severe acute respiratory syndrome coronavirus 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
horseshoe bats http://purl.obolibrary.org/obo/RO_0002453 hostOf SARS-CoV 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus pearsoni http://purl.obolibrary.org/obo/RO_0002453 hostOf SARS-like coronavirus isolate Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus pearsoni http://purl.obolibrary.org/obo/RO_0002453 hostOf SL-CoV Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus pussilus http://purl.obolibrary.org/obo/RO_0002453 hostOf SARS-like coronavirus isolate Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus pussilus http://purl.obolibrary.org/obo/RO_0002453 hostOf SL-CoV Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus macrotis http://purl.obolibrary.org/obo/RO_0002453 hostOf SARS-like coronavirus isolate Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus macrotis http://purl.obolibrary.org/obo/RO_0002453 hostOf SL-CoV Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus ferrumequinum http://purl.obolibrary.org/obo/RO_0002453 hostOf SARS-like coronavirus isolate Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.
Rhinolophus ferrumequinum http://purl.obolibrary.org/obo/RO_0002453 hostOf SL-CoV Rp3 10.1234/testing-covid-rels https://doi.org/10.1234/testing-covid-rels Wendong Li. (2005). Bats Are Natural Reservoirs of SARS-like Coronaviruses. Zenodo. https://doi.org/10.1234/testing-covid-rels Zenodo. 2020. Zenodo publication with biotic interaction annotations.

jhpoelen referenced this issue in globalbioticinteractions/globalbioticinteractions Jul 15, 2020
@mguidoti
Copy link

Hi @jhpoelen,

The DOIs of the annotated publications were used to import the remaining metadata info into the Zotero libraries. This means that the DOI is, for most of the entries, the first and only piece of bibliographic information that we manually looked and copied/pasted.

The few ones missing DOIs in the PDF were queried, by the title, using Refindt. Luckily, all of the 160 imported publications had DOIs.

@jhpoelen
Copy link
Member Author

jhpoelen commented Jul 17, 2020

Today, GloBI indexed interactions coming from the publications annotated that @mguidoti uploaded to Zenodo. These publication were indexed used the new Zenodo biotic interaction annotations that @slint recently introduced.

For initial results, see:

https://www.globalbioticinteractions.org/?accordingTo=globi%3Aglobalbioticinteractions%2Fzenodo-metadata

Also, see attached screenshots.

Please note that additional work is needed to improve the linking of . . . names!

Screenshot from 2020-07-17 05-52-20
Screenshot from 2020-07-17 05-29-58

@jhpoelen
Copy link
Member Author

The DOIs of the annotated publications were used to import the remaining metadata info into the Zotero libraries. This means that the DOI is, for most of the entries, the first and only piece of bibliographic information that we manually looked and copied/pasted.

@mguidoti how do you suggest to differentiate between the Zenodo DOI and the original publication doi?

@mguidoti
Copy link

@jhpoelen I'm sorry, I'm not sure if I'm following you.

We used the original DOI publications, for all of the 160 uploaded papers.. so you wouldn't have the problem if you simply retrieve the DOI associated with these publications from Zenodo API. That's why I'm not quite understanding your question.

What am I missing?

Thanks

@jhpoelen
Copy link
Member Author

Yes, you answered my question. Both DOIs are available. Sorry for the confusion.

@mguidoti
Copy link

That's actually interesting for me because, to the best of my knowledge, Zenodo shouldn't be issuing DOIs for deposits with provided DOIs...

For instance, in this one, I can only see one DOI.

Could you show me one example?

I'm insisting because I often have to upload things, and that would be a different behavior than what I would expect... something that I should definitely keep in mind if its the case.

Thanks!

@jhpoelen
Copy link
Member Author

@mguidoti Thanks for pointing out the example and correcting my assumption that new DOIs were issues by Zenodo.

jhpoelen referenced this issue in globalbioticinteractions/globalbioticinteractions Jul 17, 2020
@jhpoelen
Copy link
Member Author

jhpoelen commented Sep 9, 2020

Just reconfirmed that the Plazi <> Zenodo <> GloBI is working as expected and is continuously re-indexed.

Thanks again to @mguidoti @slint and @myrmoteras for making this happen!

Screenshot from 2020-09-09 14-55-44

@mguidoti
Copy link

Awesome to see this, Jorrit!

Thanks!

@jhpoelen
Copy link
Member Author

jhpoelen commented Oct 31, 2023

@slint @mguidoti In running routine integration tests, I appears that the custom RO searches are no longer producing the expected results.

How should I adjust my queries to re-gain access to a wealth of biotic interaction publications provided by Zenodo / Plazi integration we established three years ago?

Current sample query:

https://sandbox.zenodo.org/api/records/?custom=%5Bobo%3ARO_0002453%5D%3A%5B%3A%5D

no longer returns any of the expected results as previously found and recorded.

https://github.com/globalbioticinteractions/globalbioticinteractions/blob/9411ff7072939b45737fbe87df311b082ddad8d1/eol-globi-data-sources/src/test/resources/org/eol/globi/data/zenodo/search-results.json#L90-L101

@jhpoelen jhpoelen reopened this Oct 31, 2023
@jhpoelen
Copy link
Member Author

without the "/" https://sandbox.zenodo.org/api/records?custom=%5Bobo%3ARO_0002453%5D%3A%5B%3A%5D

results are retrieved, but no "custom" field is available -

curl "https://sandbox.zenodo.org/api/records?custom=%5Bobo%3ARO_0002453%5D%3A%5B%3A%5D" | jq . | grep custom
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  113k  100  113k    0     0  37927      0  0:00:03  0:00:03 --:--:-- 37927

yielded no results.

Same no result result when searching for RO term

curl "https://sandbox.zenodo.org/api/records?custom=%5Bobo%3ARO_0002453%5D%3A%5B%3A%5D" | jq . | grep RO_0002453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants