-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest and review nomenclatural sources #22
Comments
I have several projects that map names to literature, using nomenclators as a starting point, such as ION (used by BioNames), IPNI (making well underway), and Index Fungorum (just started). Happy to contribute literature links. I'm also doing work on clustering names within nomenclators to get around massive duplications (e.g., ION and IPNI). |
Some additional name sources include: World Spider Catalog LSIDs with literature as strings, some overlap with ION but better quality literature citations for older names Some databases are explicitly about nomenclature, many are also about taxonomy (or confound the two). |
This is EXCELLENT! And EXACTLY what is needed, in my opinion! @rdmpage -- have you and @dimus compared systems for clustering names? Massive duplications also exist among literature citations (including the dataset you gave me a few years ago). Is there any work within this group to do similar clustering of literature citations? I've been chipping away at this, starting with Journal names. @gsautter has worked on this through RefBank, and there is a parsing service available that seems to work pretty well. As I've said many times before, reconciling names is relatively easy compared to reconciling literature citations (I would estimate that 80% of the effort to reconcile and import a batch of names into GNUB is spent on reconciling and importing the associated literature) -- probably why there are so many efforts to build lists of names, and so few that focus on linking those names to literature. In any case, I hope CoLPlus remains committed to incorporating a "names-linked-to-literature" approach, rather than just another "names and associated concepts" approach. It requires a bit of extra work up-front, but the rewards are VASTLY greater. |
Don't forget Index Sherborne's Animalium, Rich. I think you would have the most up to date and parsed copy. If there is more parsing to do we might consider seeing if dima is up for it but most should be in good shape. For subsequent combinations, there is a reference to the original combination (I think it's just a reference to the original genus) so there are homotypic synonyms accessible there as well. Some taxonomic database have parsed and separate nomenclature databases inherent to them. I can recall Thompson's diptera, there is an algal nomenclator. Index Fungorum, of course, etc. |
@dremsen The whole Sherborn - ION - BHL mapping doi:10.3897/zookeys.550.9673 should be opened up as well. AFAIK ION have it but haven't made it available to anyone not visiting their web site (e.g., I gather that BHL don't have it ). I've made a start on trying to resurrect it via screen scraping, see https://github.com/rdmpage/ion-sherborn I've also grabbed a copy of Index Animalium and put it in a repository https://github.com/rdmpage/index-animalium |
@dremsen and @rdmpage : Index Animalium represents the PERFECT example of what I'm talking about. There are 7,723 literature citations in the combined bibliography, and 429,829 TNUs (approximately 350K Protonyms). It's an absolute GOLD MINE of information (massive numbers of Protonyms, homotypic synonyms/combinations/spelling variants, etc.), ALL of which are anchored to literature. The records (both bibliography and TNUs) are almost completely parsed (just another week or so needed to finish parsing the microcitations connected to each TNU record). So... what's the hold-up? The literature! The bibliography is highly abbreviated (e.g., no titles and highly abbreviated -- and inconsistent -- Journal names). Even though it's almost fully parsed, most of the records have scant field values. Suzanne Pilsk (lead author of the paper cited by Rod) had made it her mission to tie Sherborn bibliography records to proper citations, and as of the last cut I got from her, 4,477 of them had been fleshed out. The remaining 3,246 represent (almost by definition) the most difficult to pin down. I had been working on cleaning up just the Journals, with the hope of identifying full citations (e.g., from RefBank) via Journal+Volume+Startpage, but there are no page numbers (bummer), and there are still over 2,800 unique and highly abbreviated text strings from which Journals need to be derived. Once we do clean up & flesh out the literature (or decide that we're OK with dirty microcitations as our anchorpoints to the literature), the next hurdle will be to cross-link the microcitations in the TNU records (again, incomplete & inconsistent) to the corresponding bibliography record. That should be relatively striaghtforward -- maybe a week or two to complete. After that, the names are an absolute breeze (probably less than a day's work). If we're OK with incomplete bibliographic citations (which doesn't connect us to BHL pages, but eventually we can flesh them out later), I'm willing to dust that project off and bump it to the top of my "CFT" (Copious Free Time) priority list, if this group thinks it's a worthy investment in time (actually, after looking at the DB, I'm getting more excited about it myself). Bottom line: Sherborn is not a "names" problem (we already have the names as Name-Stings, plus authors, combination authors, etc.) It's a literature problem -- which brings me back to my previous post on this. |
@deepreef I have created a new issue #23 to discuss how to deal with literature. Lets keep this issue for listing nomenclatural sources Making Index Animalium open and accessible would be a very good thing. |
Understood, and agreed! I just wanted to use @dremsen suggestion of Sherborn to illustrate the point made earlier. Also, we already have zillions of sources of names. There is no shortage of those. What we need for CoL+ to actually get beynd what we already have is sources of names linke dto literature, and Sherborn Index Animalium is a big one! :-) I'm happy to share what I have, but perhaps give me a week to clean up a few loose ends. I don't know the state of others (mine is a more highly parsed version of what @dremsen provided to my years ago, which I believe was originally parsed by Pat Leary). |
Oh and let's not forget Wikispecies which is mixed, but has lots of literature. Unfortunately it's in a somewhat idiosyncratic format. I'll be at Wikicite 2017 next week working on a tool to parse Wikispecies citations. Apart from the literature Wikispecies is a potential source of links to Wikidata and author identifiers, so has an important role to lay for those of us obsessed with linking stuff together. |
Also see official code lists #26 |
Came across yet another Fungi nomenclator: http://www.cybertruffle.org.uk/cybernome/eng/index.htm |
Oh, no…
[اللغة العربية - 中文 - Français - Deutsch - ქართული - Polski - Português - Русский - Español - Український]
Dear Markus,
This logo taken from the website is probably a best characteristic of the project itself. This is another harvester with very selective list of collaborators. Looking on sponsors, I even have a guess who is behind this resource.
With MycoBank and Index Fungorum recognized by the Code, you do not need any other “nomenclators” for fungi.
|
Initial imports from sources relevant to nomenclature should be considered. Please suggest relevant nomenclatural sources as comments in this issue. Key information is the name with authorship, the literature reference ideally with a DOI or link, type material and the basionym/protonym information
Relevant sources that should be synced continously:
Other potential sources
The text was updated successfully, but these errors were encountered: