Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.

Try this out and use cases pretty please? #4

Open
sckott opened this issue Oct 1, 2016 · 4 comments
Open

Try this out and use cases pretty please? #4

sckott opened this issue Oct 1, 2016 · 4 comments

Comments

@sckott
Copy link
Contributor

sckott commented Oct 1, 2016

@taddallas @qgroom

I already got some context for use case at ropensci/rgbif#223 - but hoping for more use cases and and for people to try this out so we make sure that we're solving problems people actually have. Please do ping anyone else you think might be interested.

maybe @tomjwebb is interested ?

@taddallas
Copy link

So the output of hoa_search and hoa_gbif is a list of two tibbles each containing a key and some unstructured text. Is this correct? It would be nice to have a column of species names. I don't think the host information is available (which makes the name of the package a bit ironic, right?), but if I were to search at the genus level (e.g., hoa_search('Ixodes')), I have no way of parsing out species identities. Also, gbif provides information on latitude and longitude of interaction, but I don't see this information in the output of hoa_search. Also, it seems like there is pertinent information on sampling date and citation (for some occurrences) that could be included.

Without knowing information on host species, I see a couple possible use cases involving the mapping of parasite diversity (e.g., I search for a bunch of parasites, map out occurrence points, determine range area, and then overlay a bunch of parasite polygons to get a coarse idea of diversity) or species distribution modeling efforts (e.g., relating parasite occurrences to the host community from GBIF or IUCN data, along with climate data). If we had some labeled data on known host-parasite interactions (from Global Mammal Parasite Database maybe?), it'd be fun to see if we could reconstruct the plausible set of host species using just the parasite occurrence data. How sick would that use case be?!

@sckott
Copy link
Contributor Author

sckott commented Oct 2, 2016

So the output of hoa_search and hoa_gbif is a list of two tibbles each containing a key and some unstructured text. Is this correct?

yes!

It would be nice to have a column of species names. I don't think the host information is available

can definitely do that

Also, gbif provides information on latitude and longitude of interaction, but I don't see this information in the output of hoa_search. Also, it seems like there is pertinent information on sampling date and citation (for some occurrences) that could be included.

I attempted to give back just the relevant host columns with host/parasite/etc. info, - BUT importantly including the occurrence key, so that you can easily merge that info to the remainder of the occurrence data - But, perhaps I can return all data

I search for a bunch of parasites, map out occurrence points, determine range area, and then overlay a bunch of parasite polygons to get a coarse idea of diversity

thanks, sounds like a good use case

species distribution modeling efforts (e.g., relating parasite occurrences to the host community from GBIF or IUCN data, along with climate data).

nice, good one

If we had some labeled data on known host-parasite interactions (from Global Mammal Parasite Database maybe?), it'd be fun to see if we could reconstruct the plausible set of host species using just the parasite occurrence data. How sick would that use case be?!

What do you mean by "labeled"?


We can assume that we will know the host species, I'm about to add that in to the results.

@taddallas
Copy link

I thought that the parasite occurrence data didn't include information on host species. I was thinking that if we had a set of data for which we knew the host species of a given parasite, would it be possible to train a model on georeferenced parasite occurrence points where the host is known (labeled occurrence data) to predict the likely host species of parasite occurrences where the host species was unknown (unlabeled occurrence data). It's a random thought, and is moot since the data contains information on host species identity.

@sckott
Copy link
Contributor Author

sckott commented Oct 3, 2016

I thought that the parasite occurrence data didn't include information on host species.

Sorry, just meant we can include the species name in the output, whatever is associated with the output of the hoasts functions

would it be possible to train a model on georeferenced parasite occurrence points where the host is known (labeled occurrence data) to predict the likely host species of parasite occurrences where the host species was unknown It's a random thought, and is moot since the data contains information on host species identity.

Seems like a great use case. You said it's moot, but are there cases in which it's not moot?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants