Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show what information was used to infer some interaction type #13

Open
jhpoelen opened this issue Dec 26, 2019 · 8 comments
Open

show what information was used to infer some interaction type #13

jhpoelen opened this issue Dec 26, 2019 · 8 comments

Comments

@jhpoelen
Copy link
Member

e.g.,

scientificName: bee
associatedTaxa: plant

GloBI currently silently maps this to:

source taxon: bee
interaction type: interactsWith
target taxon: plant

Suggest to make this inference explicit like:

  1. GloBI used associated taxa field to infer the interaction type to be interactsWith
  2. GloBI used associated taxa field to infer the target taxon name to be "plant"
    etc.
@jhpoelen
Copy link
Member Author

A process perspective:

  1. there is a mapping processes X1, X2, X3
  2. there are input fields scientificName, associatedTaxa
  3. there are output fields targetTaxon, sourceTaxon, interactionType
{scientificName: bee} derived from archive X (links to preston / content-hashing). 
X1 ({scientificName: bee}) -> { sourceTaxon: bee }

@jhpoelen
Copy link
Member Author

@jhpoelen
Copy link
Member Author

@seltmann please note that interaction types can be inferred from the context, a column name or a field value. Is there any specific notation you prefer when reporting on the origin of an interaction type and how it got mapped into GloBI ?

@seltmann
Copy link
Member

seltmann commented Feb 3, 2020

I am not sure because I dont know all of logic going into the inferences. For a start, knowing that x maps to y would work, even if x maps to 2 separate things?

This also may not be as possible as I imagine.

@jhpoelen
Copy link
Member Author

@zedomel also suggested to include hints on where GloBI got the interaction terms from :

for instance

{
  "reviewId": "d17a4237-d62b-49d3-b726-62a309aaa08c",
  "reviewDate": "2021-05-17T22:30:29Z",
  "reviewerName": "GloBI automated reviewer (elton-0.10.9)",
  "reviewCommentType": "note",
  "reviewComment": "found unsupported interaction type with name: [Sydney]",
  "namespace": "local",
  "context": {
    "archiveURI": "file:///home/ubuntu/globi-dwca-index/./",
    "contentHash": null,
    "dwc:coreid": "DD0C8780FFA2FF81A5ECEB50FB299AD3.taxon",
    "interactionTypeName": "Sydney",

    "referenceCitation": "Emery, Nathan J., Emery, David L., Popple, Lindsay W. (2015): A redescription of Yoyetta landsboroughi (Distant) and Y. tristrigata (Goding and Froggatt) (Hemiptera: Cicadidae) and description of four new related species. Zootaxa 3948 (3): 301-341, DOI: http://dx.doi.org/10.11646/zootaxa.3948.3.1",
    "referenceUrl": "http://treatment.plazi.org/id/DD0C8780FFA2FF81A5ECEB50FB299AD3",
    "sourceTaxonClassName": "Insecta",
    "sourceTaxonFamilyName": "Cicadidae",
    "sourceTaxonGenusName": "Yoyetta",
    "sourceTaxonKingdomName": "Animalia",
    "sourceTaxonName": "Yoyetta landsboroughi Distant 1882",
    "sourceTaxonOrderName": "Hemiptera",
    "sourceTaxonPath": "Animalia | Arthropoda | Insecta | Hemiptera | Cicadidae | Yoyetta",
    "sourceTaxonPathNames": "kingdom | phylum | class | order | family | genus",
    "sourceTaxonPhylumName": "Arthropoda",
    "studySourceCitation": "hash://sha256/009335f282e8ac313aaf3e31aefe9d98e464c8c98251930c99575fe724f7f058. Accessed at <file:///home/ubuntu/globi-dwca-index/./> on 17 May 2021.",
    "studyTitle": "http://treatment.plazi.org/id/DD0C8780FFA2FF81A5ECEB50FB299AD3",
    "targetTaxonName": "3. xii. 1998"
  }
}

with comment:

Why are the DwC fields not listed in the JSON above? It says that "found unsupported interaction type with name: [Sydney]" but I don't know where this interaction type comes from (which DwC field).

@jhpoelen
Copy link
Member Author

@zedomel also offered a suggestion to include more of the dwca context into the review notes:

{
  "reviewId": "721e6db9-9e60-417c-bf44-9b83a93fd857",
  "reviewDate": "2021-05-17T21:54:52Z",
  "reviewerName": "GloBI automated reviewer (elton-0.10.9)",
  "reviewCommentType": "note",
  "reviewComment": "target taxon name missing: using institutionCode/collectionCode/collectionId/catalogNumber/occurrenceId as placeholder",
  "namespace": "local",
  "context": {
      "core": {
          "rowtype": "http://rs.tdwg.org/dwc/terms/Taxon",
           ...
      },
      "extensions": [
           {
              "rowtype: "http://rs.tdwg.org/dwc/terms/Occurrence",
               ...
           },
           {
              "rowtype: "http://rs.gbif.org/terms/1.0/Description",
               ...
           },
      ],
      "sourceTaxonName": "Lymmaea danielsi",
      "sourceTaxonDwCField: "http://rs.tdwg.org/dwc/terms/scientificName"
}

@zedomel
Copy link
Member

zedomel commented May 26, 2021

Thank you @jhpoelen for open this issue.

Let me know if there is anything that I can help with. For instance, if you guide me, providing which source files need to be changed or created in order to implement this suggestion, I can provide some code and make a pull request.

best,
josé.

@jhpoelen
Copy link
Member Author

jhpoelen commented May 28, 2021

For instance, if you guide me, providing which source files need to be changed or created in order to implement this suggestion

@zedomel Help is much appreciated!

Perhaps a good starting point for the DwC-A functionality is:

https://github.com/globalbioticinteractions/globalbioticinteractions/blob/e9ba4e907c8a549327736f6e036175387a1a0011/eol-globi-data-sources/src/main/java/org/eol/globi/data/DatasetImporterForDwCA.java#L153

Note that I try to use test cases to help keep the code relatively healthy and a little easier to maintain. An example of a DwC-A related unit test can be found at:

https://github.com/globalbioticinteractions/globalbioticinteractions/blob/e62c14e9bc03ebe564c341993c55a9846759845e/eol-globi-data-sources/src/test/java/org/eol/globi/data/DatasetImporterForDwCATest.java

Please holler if you need help understanding the code, happy to spend some time with you to detail design concepts and architecture.

Curious to see what you come up with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants