include more trait datasets incl. Std version #20

fdschneider · 2017-11-15T11:59:48Z

the package should provide more datasets from the living spreadshet (fdschneider/bexis_traits#20).

identify data for integration
write script to extract data upon call of data() (files are placed in 'data/' directory)
include documnetation in package files 'R/data.R'

A standardised version of each dataset should be provided as well (linking to trait Thesauri and taxon Ontologies).

caterinap · 2017-11-15T12:49:22Z

No super sure about including more datasets in the package itself (I don't know if there is an "ideal" size for a package). If we do, they should be small, I guess.
We can alternatively/also provide a tutorial with more examples on how to handle different trait datasets using the package (not only the CC.0 ones).

fdschneider · 2017-11-15T15:25:30Z

Ha! Trick is, we're not including the datasets, just provide code to pull the datasets from their source:

See files in data.R. Only when you call data(carabids) the file is downloaded and made available for use. The package remains small. The user decides what to download.

The package vignette contains plenty of advice on how to harmonize own data, or data from other sources.

caterinap · 2017-11-17T09:25:42Z

Then it's all good!! Sorry, I need to dive a bit more into the package!

caterinap · 2017-11-21T15:12:33Z

@fdschneider I started to add more datasets in https://github.com/caterinap/traitdataform/tree/master/data. See if it's fine, I can continue adding more later in the week.
Also added more entries in the spreadsheet and a new column indicating if the dataset is in the package.

Rekyt · 2017-11-21T15:31:23Z

Hi @fdschneider, you're initiative seems really cool! I hope to use it soon ;)

A lot of work has been done by people who built Eco Data Retriever (http://www.data-retriever.org/, Github Repo) you can see the available datasets here.

I'm also thinking about the trait package by rOpenScience. Maybe you could use some wrappers to those already built tools?

fdschneider · 2017-11-21T15:41:52Z

@Rekyt Thanks. Yes, I looked into those. We basically use the same idea as Retriever when pulling example datasets from the original sources on Figshare or wherever. The 'traits' package is great for tapping APIs of more extensive databases. There is also the package 'TR8'.

It would be cool to have wrappers for these data sources that add harmonization on top.

caterinap · 2017-11-28T15:20:02Z

Ok, now all CC.0 are in the package, on the same form as the "carabids" one. On windows I did not get errors when building the package (only warnings).

Some remarks:

I did not modify yet 3 datasets: biotraits, plantsBROT and plantsD3 because they have CC BY 4.0 license. For the moment they are still there and we can decide to remove or modify them later.
In the heteroptera_raw I did not change the coordinates into decimal because we do not import any package to do so (as far as I saw)

Have a look and let me know if you want to add/remove/change anything!

fdschneider · 2017-11-28T15:45:45Z

Great, thanks.

I will pull and test it.

I wasn't aware that some of those datasets have so many traits. Great job mapping them to the ontologies.
However, I just noticed that the URIs in Nadjas list are not correct. They should correspond to the URL with headings: e.g. https://ecologicaltraitdata.github.io/TraitDataList/#age_at_reproduction.
We should fix this in the TraitDataList repository, @nadjasimons.

Furthermore, I thought that some of the cryptic trait names might be replaced by more intuitive trait names.
E.g. if the thesaurus call states

X10.2_SocialGrpSize = traitdataform::as.trait("social_group_size", expectedUnit = NA,
valueType = "numeric"),

The function standardize() will keep the original name in traitName but replace it with the easier one in traitNameStd.

The CC BY 4.0 data could be added in the future in just the same way, since we always state the correct reference.

I think the Ricklefs data on passerine birds can't be included since it is not labelled as public domain or CC by. Sorry, that license statement in the documentation is my fault, I guess. I already removed it from the current version.

caterinap · 2017-11-28T15:55:34Z

ok, so I will:

change the URIs once they are fixed
modify cryptic trait names
add the CC BY 4.0 datasets (when I have a bit of time)

Concerning the passerine, I actually checked before adding it and in the metadata (which is a word file in the supplementary) he states:

Copyright restrictions: None

Proprietary restrictions: None

Costs: None

So I guess that we could keep it.

fdschneider · 2017-11-28T16:47:11Z

Ok, thanks. No pressure. Whenever you find time.

The passerines: I'm relieved. After I was assured that the data are open by a colleaque, I was desperately looking for this disclaimer but didn't find it. Great 'bad example' for open data labelling.

nadjasimons · 2017-11-29T17:07:29Z

I fixed URIs in the trait data list

fdschneider · 2018-11-26T16:18:16Z

For now this is put on halt because it overlaps with functionality provided by Will Pearses natdb package (@willpearse). They include 100+ datasets with short recipes (see this file), and in the process fix some major heterogeneity in the data (like replacing abbreviations with species names or adding units). I did not have the time to investigate how the data are processed into a virtual database.
We should figure out how the two packages can complement each other.

Regardless, I would like to include Caterinas Pull request for v1.0 to have some more example datasets to draw from.

willpearse · 2018-12-04T18:18:34Z

Sorry to have been a bit slow to reply to this mention. We have a plan, right now, to get a citable bioRxiv paper for MADworld (which is going to combine NACDB and NATDB) up ~late January early February. We are _definitely_ interested in inter-operability, and I would love to make a wrapper linking your data structure into NATDB format. As I've mentioned before, but don't mind saying again, I think what you've done here is _fantastic_!!!

…

On Mon, 26 Nov 2018 at 09:18, Florian Schneider ***@***.***> wrote: For now this is put on halt because it overlaps with functionality provided by Will Pearses natdb package <https://github.com/willpearse/natdb> ***@***.*** <https://github.com/willpearse>). They include 100+ datasets with short recipes (see this file <https://github.com/willpearse/natdb/blob/master/R/downloads.R>), and in the process fix some major heterogeneity in the data (like replacing abbreviations with species names or adding units). I did not have the time to investigate how the data are processed into a virtual database. We should figure out how the two packages can complement each other. Regardless, I would like to include Caterinas Pull request for v1.0 to have some more example datasets to draw from. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABLcUi-l4YVa5aRl9leCv-fzs84DTOyNks5uzBRJgaJpZM4Qe0U6> .

fdschneider · 2018-12-07T16:31:52Z

Thanks Will, and sorry for not keeping up with our earlier e-mail discussion. I wanted to get a first functional version out before investigating further on interfaces with other tools.
Let me know how I can help making this work seamlessly with your package.

willpearse · 2018-12-07T16:35:27Z

No worries; that's just life! :D

Makes sense to get something out that's functional first. When you have that ready, ping me and I will (1) take a look and then (2) figure out a path forward.

fdschneider added this to the v0.3 milestone Nov 15, 2017

fdschneider assigned caterinap Nov 15, 2017

fdschneider mentioned this issue Nov 15, 2017

add Std versions of all datasets #18

Closed

fdschneider removed this from the v0.3 milestone Nov 26, 2018

fdschneider added this to the v1.0 milestone Nov 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include more trait datasets incl. Std version #20

include more trait datasets incl. Std version #20

fdschneider commented Nov 15, 2017 •

edited

Loading

caterinap commented Nov 15, 2017

fdschneider commented Nov 15, 2017

caterinap commented Nov 17, 2017

caterinap commented Nov 21, 2017

Rekyt commented Nov 21, 2017

fdschneider commented Nov 21, 2017

caterinap commented Nov 28, 2017

fdschneider commented Nov 28, 2017

caterinap commented Nov 28, 2017 •

edited

Loading

fdschneider commented Nov 28, 2017

nadjasimons commented Nov 29, 2017

fdschneider commented Nov 26, 2018

willpearse commented Dec 4, 2018 via email

fdschneider commented Dec 7, 2018

willpearse commented Dec 7, 2018 via email

include more trait datasets incl. Std version #20

include more trait datasets incl. Std version #20

Comments

fdschneider commented Nov 15, 2017 • edited Loading

caterinap commented Nov 15, 2017

fdschneider commented Nov 15, 2017

caterinap commented Nov 17, 2017

caterinap commented Nov 21, 2017

Rekyt commented Nov 21, 2017

fdschneider commented Nov 21, 2017

caterinap commented Nov 28, 2017

fdschneider commented Nov 28, 2017

caterinap commented Nov 28, 2017 • edited Loading

fdschneider commented Nov 28, 2017

nadjasimons commented Nov 29, 2017

fdschneider commented Nov 26, 2018

willpearse commented Dec 4, 2018 via email

fdschneider commented Dec 7, 2018

willpearse commented Dec 7, 2018 via email

fdschneider commented Nov 15, 2017 •

edited

Loading

caterinap commented Nov 28, 2017 •

edited

Loading