Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather 2018 Distinct Value lists #39

Open
tucotuco opened this issue Jun 4, 2018 · 12 comments
Open

Gather 2018 Distinct Value lists #39

tucotuco opened this issue Jun 4, 2018 · 12 comments
Assignees

Comments

@tucotuco
Copy link
Member

tucotuco commented Jun 4, 2018

Our distinct value lists from 2017 are more than a year old now. We intended to try to make annual copies of these, so any time now will be good to gather these again.

John can do this for VertNet and request it of GBIF.

@tucotuco tucotuco self-assigned this Jun 4, 2018
@debpaul
Copy link
Member

debpaul commented Jun 4, 2018 via email

@Tasilee
Copy link

Tasilee commented Jun 4, 2018

Thanks Deb: I have notified our lead ALA developer Nick Dosremedios about this.

@tucotuco
Copy link
Member Author

tucotuco commented Jun 4, 2018

I have made the request to Tim Robertson at GBIF who put them together for us last time. I'll work on the VertNet values.

@nickdos
Copy link

nickdos commented Jun 5, 2018

I'm happy to generate such lists from the ALA. Is there is set of DwC fields (and/or non-DwC fields) that I can use to query with?

@tucotuco
Copy link
Member Author

tucotuco commented Jun 5, 2018 via email

@debpaul
Copy link
Member

debpaul commented Jun 5, 2018 via email

@tucotuco
Copy link
Member Author

tucotuco commented Jun 5, 2018 via email

@debpaul
Copy link
Member

debpaul commented Jun 5, 2018 via email

@tucotuco
Copy link
Member Author

tucotuco commented Jun 5, 2018

VertNet distinct values added in commit tdwg/dwc-qa@449824b.

@nickdos
Copy link

nickdos commented Jul 4, 2018

I've managed to pull out unique values for a subset of fields from the ALA SOLR index. We don't index all fields, so the missing fields might be able to be generated via a Cassandra (I don't know how to). I figured this subset would be a good start and our next major release should include all DwC fields (we're moving to a clustered architecture to handle the bigger data).

Should I attach the TXT file to this issue or commit it to a directory or another repo - I noticed the comment above references a commit that is not linked in this repo, so wanted to check first.

Edit: ZIP file with shell script and output from script

fields used: basis_of_record country_code country month year establishment_means raw_identification_qualifier license occurrence_status_s reproductive_condition_s raw_sex rank type_status

@tucotuco
Copy link
Member Author

tucotuco commented Jul 4, 2018 via email

@nickdos
Copy link

nickdos commented Jul 12, 2018

Hi @tucotuco, I've created another PR with some changes, including the suggested readme file, using sub-directories with date, as well as indicating "index" values in the file name, similar to how iDigBio does it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants