Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check spp.key #1

Open
rBatt opened this issue Oct 28, 2015 · 4 comments
Open

check spp.key #1

rBatt opened this issue Oct 28, 2015 · 4 comments

Comments

@rBatt
Copy link
Owner

rBatt commented Oct 28, 2015

@bselden @JWMorley

This repo will become an R package, but it's still in development.

The file spp.key.csv has all of the known "raw" (as-entered) taxonomic identifiers (species names) from all regions. But it needs to be checked.

Most species have had something found. The "raw" column is named "ref", and the "corrected" column is named "spp".

Looking through, some of the "corrected" spp names are clearly wrong, as are some of the common names.

Feel free to make corrections, and commit/ push the changes. But please use Git. You may want to install git lfs before downloading this repo (otherwise, the large file storage might break, or you'll end up with bigger files than you want; I'm not sure what happens).

Note that each value in "ref" is unique, but the "spp" values are not. Make sure you do not create any inconsistencies as you edit the file. E.g., if you see that spp=="zoroaster" does not actually have a common name of "frogfish", don't change the common name to "seastar" on only 1 line ... make sure that the updated file has the same common name for all "zoroaster".

I can explain further when you decide to take a look. Just let me know.

rBatt added a commit that referenced this issue Oct 30, 2015
#1 @bselden @JWMorley be aware that I updated the csv again, and in a way that could have overwritten some previous changes. I didn't see any new commits from either of you on the GitHub issue, so I'm assuming no work had been done there.
@rBatt
Copy link
Owner Author

rBatt commented Oct 30, 2015

@JWMorley @bselden I made a video to show how I was making some of the corrections --- they still aren't perfect by any means. But I do show how I filter down to some of the entries that might merit a second glance.

https://www.youtube.com/watch?v=RZlUds2Ph_0

Feel free to go about this however you want, if you can find time. Any help is much appreciated.

@rBatt
Copy link
Owner Author

rBatt commented Oct 30, 2015

@JWMorley @bselden Note that I edited the link to the video; so in your email you will still only see the old link. The new link is here: https://www.youtube.com/watch?v=RZlUds2Ph_0

And that is the same link that will appear on the GitHub issues site.

rBatt added a commit that referenced this issue Oct 31, 2015
and helper functions; related to #1 @bselden and @JWMorley you might want to be aware of how I did this, and how it differs slightly from the video I link in Issue #1. Basically what I changed is something I already pointed out in the video: I wrote a function to avoid introducing inconsistencies.
rBatt added a commit that referenced this issue Dec 15, 2015
@rBatt
Copy link
Owner Author

rBatt commented Dec 15, 2015

@JWMorley @bselden @mpinsky

So, I'm updating the data sets (the US ones for now), and I found 1131 new raw taxonomic ID's that aren't in spp.key already ... whoa. I'll put my auto-match code to work, but everything added in this way will be given the flag of "added_automatically".

Working on properly adding these to the spp.key. It's basically done, just need to integrate it well with make() and add checks.

@rBatt
Copy link
Owner Author

rBatt commented May 6, 2016

I've recently gone through most lines of the spp.key manually.

I've manually checked 2654 rows in the recent effort; another 548 are "ok", 53 "manual", 577 "fine", 586 "bad", 316 "becca_batch2", and a lot of other random flags that indicate it's been checked in some way. In theory, the "bad" rows might need to be fixed, but they generally aren't ID'd to species, and are tossed out in the trim row due to that flag; so they aren't a big worry.

There are 1009 rows that were "added_automatically", and 349 have an NA flag. None of these rows pertain to species that are in the current trawlDiversity analysis (due to subsampling years, day of year, and strata).

So this is very near completion, and is much less of a worry for my current analysis, but could still use some work. I also wouldn't be surprised is some of my "check" rows had errors/ typos (I found 1 or 2 already). So it ain't perfect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant