-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider to remove pure annotation authorships #890
Comments
@dhobern what do you think as TG? |
@bart-v (happy New Year!), what do you think about this? (Seems, many names came from WoRMS World Polychaeta Database). |
My personal opinion is that comments in authorstrings always contain important information for taxonomists about the use of a name. If a taxonomist (as the author of a publication/project) decides to add a comment to an authorstring, he would like to make it undetachable from the scientific name (even if there is a separate "Comment" field in the database). As an editor, I respect the practices considered by our authors. In a practical sense, I have no capacity to control and modify authorstrings in the CoL. However, I have no objection, if authorstrings will be corrected in GSD projects. |
I have raised this a few times with our editors already, that this is a bad practice. If we want to properly fix it, we need to adapt the display (in WoRMS) in some way. (Yuri, Markus, Donald, best wishes for 2025! ) |
Thanks @bart-v, best wishes too! I fully agree it's always best to have the authors provide the authorship for a name and it is good to know you also think this is bad practice. Let's hope it will disappear in the future. In general though I think COL has the responsibility of doing at least some basic QC. The authorship field is a pretty important field and not a free text field where you can enter any editorial remark. |
Lots of thoughts here, probably none very helpful. The most essential question is probably why we are producing COL in the first place. I believe our primary use case is to provide the digital reference that users need for interpreting scientific names, either as human readers or as software. The most important part of this is to provide unambiguous information on whether a name is recognised, whether it is the accepted name for a taxon or a name that refers to a taxon with an accepted name, and where this taxon fits in the tree of life. From this perspective, name usages fall into several buckets:
COL should aim to include every name under 1 - if we can do this, we have achieved something magnificent. Different communities may also want to record any or all of 2-5, and there should be no harm in including these in GSDs, but we need clarity at all points on what they represent. If these are not clearly marked as different from 1, we are in trouble. These then undercut the functionality for our main user base. Those that are interested in these may only rarely use COL, since interest in each of these categories is likely to be limited to taxonomists and others working with the literature and specimens for a given group. Names that represent temporary labels for undescribed species may be of wider interest, but these are rarely sufficiently standardised to work reliably as part of the naming ecosystem over time. I my opinion, leaving misapplications unflagged or poorly separated is a big mistake. It can be useful to know that two species have been confused by experts, but it is disastrous if the data suggests that the name itself is ambiguous and impossible to resolve to a single species. LepIndex is/was full of historical misspellings, misapplications, aberrations and other junk, most of them presented as binomials with authorship derived from whatever paper the misspelling, misuse, etc. occurred in. These are just pollutants that make the whole dataset less useful. I have been purging the vast majority of these and only retaining significant misspellings where these are likely to be an issue to other users. So, my feeling is that @yroskov is correct and contributors should be encouraged to record exactly what they need/want to record as qualifiers for the name, but that we really need to help all contributors make sure any names in categories 2-5 are well marked and can be excluded from downstream products. COL itself should have a way to exclude them from downloads and via the API. None of this answers the question asked. I think we should do the following:
I've suggested before that we should start sending regular (e.g. quarterly) emails to all contributors with news, tips, etc. This could be a good thing to highlight in such a medium. |
I 100% agree with @dhobern that the challenge is to provide unambiguous information on whether a name is recognised, whether it is the accepted name for a taxon or a name that refers to a taxon with an accepted name, and where this taxon fits in the tree of life. (Perfectly formulated!) However, in my mind, name usages in all five listed buckets are primary tasks for GSDs (including 2-5). For example, only true experts in taxonomy of the group are able to recognize, resolve and reflect misapplications in the checklist. Unfortunately, I am pessimistic that the task can be facilitated through regular letters to GSDs. Taxonomists do not need our instructions on how to do their job. They need the funds and a new generation of their successors in GSD projects. |
The I don't think we are discussing the main question of the issue here though: I believe COL has the mission to do this. In my view COL should aim to provide a list of all names as consistently as we can. It will be impossible to achive 100%, but we should strive for consistent name syntax (which luckily the codes mostly demand already), authorships, ranks, reference citations and distributions if we consider them to be relevant. For many things we have enumerations which we interpret lose text values to. In an ideal world we would also link to author records via identifiers instead of having authorship strings. That would clearly not allow editorial remarks... |
My point was that COL should perhaps handle this by policing the status field. If names have non-standard authorship that represents one of these other categories but are flagged just as accepted or synonym, I think we should push a different status on them and exclude them from the cleanest views of COL. In other words, we should have a quarantining approach in relation to the main COL product. |
Does CoL have the resources to monitor quarantine? |
By "quarantine", I just meant automatically excluding non-standard names from the public product. This could be automated, as could notifying the contributor that these names have been excluded and what can be done to make them part of the main product again. |
COL contains some authorships which are made up from just remarks.
Even if a GSD supplies these I would think we should ask them to clean up their names or otherwise remove them ourselves with decisions.
There are hundreds of synonyms which contain the accepted name in their authorship
Others:
The text was updated successfully, but these errors were encountered: