-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include more trait datasets incl. Std version #20
Comments
No super sure about including more datasets in the package itself (I don't know if there is an "ideal" size for a package). If we do, they should be small, I guess. |
Ha! Trick is, we're not including the datasets, just provide code to pull the datasets from their source: See files in data.R. Only when you call The package vignette contains plenty of advice on how to harmonize own data, or data from other sources. |
Then it's all good!! Sorry, I need to dive a bit more into the package! |
@fdschneider I started to add more datasets in https://github.com/caterinap/traitdataform/tree/master/data. See if it's fine, I can continue adding more later in the week. |
Hi @fdschneider, you're initiative seems really cool! I hope to use it soon ;) A lot of work has been done by people who built Eco Data Retriever (http://www.data-retriever.org/, Github Repo) you can see the available datasets here. I'm also thinking about the |
@Rekyt Thanks. Yes, I looked into those. We basically use the same idea as Retriever when pulling example datasets from the original sources on Figshare or wherever. The 'traits' package is great for tapping APIs of more extensive databases. There is also the package 'TR8'. It would be cool to have wrappers for these data sources that add harmonization on top. |
Ok, now all CC.0 are in the package, on the same form as the "carabids" one. On windows I did not get errors when building the package (only warnings). Some remarks:
Have a look and let me know if you want to add/remove/change anything! |
Great, thanks. I will pull and test it. I wasn't aware that some of those datasets have so many traits. Great job mapping them to the ontologies. Furthermore, I thought that some of the cryptic trait names might be replaced by more intuitive trait names.
The function The CC BY 4.0 data could be added in the future in just the same way, since we always state the correct reference. I think the Ricklefs data on passerine birds can't be included since it is not labelled as public domain or CC by. Sorry, that license statement in the documentation is my fault, I guess. I already removed it from the current version. |
ok, so I will:
Concerning the passerine, I actually checked before adding it and in the metadata (which is a word file in the supplementary) he states:
So I guess that we could keep it. |
Ok, thanks. No pressure. Whenever you find time. The passerines: I'm relieved. After I was assured that the data are open by a colleaque, I was desperately looking for this disclaimer but didn't find it. Great 'bad example' for open data labelling. |
I fixed URIs in the trait data list |
For now this is put on halt because it overlaps with functionality provided by Will Pearses natdb package (@willpearse). They include 100+ datasets with short recipes (see this file), and in the process fix some major heterogeneity in the data (like replacing abbreviations with species names or adding units). I did not have the time to investigate how the data are processed into a virtual database. Regardless, I would like to include Caterinas Pull request for v1.0 to have some more example datasets to draw from. |
Sorry to have been a bit slow to reply to this mention.
We have a plan, right now, to get a citable bioRxiv paper for MADworld
(which is going to combine NACDB and NATDB) up ~late January early
February. We are _definitely_ interested in inter-operability, and I would
love to make a wrapper linking your data structure into NATDB format. As
I've mentioned before, but don't mind saying again, I think what you've
done here is _fantastic_!!!
…On Mon, 26 Nov 2018 at 09:18, Florian Schneider ***@***.***> wrote:
For now this is put on halt because it overlaps with functionality
provided by Will Pearses natdb package
<https://github.com/willpearse/natdb> ***@***.***
<https://github.com/willpearse>). They include 100+ datasets with short
recipes (see this file
<https://github.com/willpearse/natdb/blob/master/R/downloads.R>), and in
the process fix some major heterogeneity in the data (like replacing
abbreviations with species names or adding units). I did not have the time
to investigate how the data are processed into a virtual database.
We should figure out how the two packages can complement each other.
Regardless, I would like to include Caterinas Pull request for v1.0 to
have some more example datasets to draw from.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABLcUi-l4YVa5aRl9leCv-fzs84DTOyNks5uzBRJgaJpZM4Qe0U6>
.
|
Thanks Will, and sorry for not keeping up with our earlier e-mail discussion. I wanted to get a first functional version out before investigating further on interfaces with other tools. |
No worries; that's just life! :D
Makes sense to get something out that's functional first. When you have
that ready, ping me and I will (1) take a look and then (2) figure out a
path forward.
|
the package should provide more datasets from the living spreadshet (fdschneider/bexis_traits#20).
A standardised version of each dataset should be provided as well (linking to trait Thesauri and taxon Ontologies).
The text was updated successfully, but these errors were encountered: