-
Notifications
You must be signed in to change notification settings - Fork 0
Curation rate estimation plan
Gaurav Vaidya edited this page Jun 27, 2018
·
13 revisions
This plan documents the methodologies we have used to quantify the rate at which we can curate phyloreferences in order to estimate how long it would take to curate a certain number of phyloreferences.
- Choose a set of papers from the list of papers to curate at the bottom of this page.
- Initially, it is fine for the curator to curate papers that have already been curated, since these are the only ones we know can be curated. However, they must not curate a paper they are familiar with themselves or that they have curated before, as that would provide an underestimate for the time taken to understand and curate a paper.
- Before starting to time themselves, the curator should confirm that:
- The paper contains at least one clade definition.
- The paper contains at least one phylogeny containing all the specifiers for at least one clade definition.
- Enter the metadata for the study, including title, citation and DOI.
- The curator should not be documenting bugs while timing themselves. Another person could watch the curator and write down suggested user interface improvements.
- Curate the phyloreferences in any order using the Curation Tool.
- The verbatim clade definition and every verbatim specifier should be entered for all phyloreferences. If possible, scientific names or specimen identifiers should also be added.
- The verbatim specifier should include all the information included in the original description, including authority information, higher taxonomy, whether the definition points to a taxon or the type specimen of the taxon, and any other included information.
- For example, when curating "Gnetum gnemon Linnaeus 1767 (Gnetophyta) and Pinus strobus Linnaeus 1753 (Coniferae)", two specifiers should be extracted: "Gnetum gnemon Linnaeus 1767 (Gnetophyta)" and "Pinus strobus Linnaeus 1753 (Coniferae)".
- Curate the phylogenies in any order using the Curation Tool.
- If the reference phylogeny is available in digital format (e.g., a Newick or Nexus file), proceed to upload the phylogeny. If the phylogeny is not available digitally, first write to author of the paper that publishes the phylogeny. If no response is given, then proceed to manually transcribe the phylogeny to a digital format.
- All phylogenies should be titled using a descriptive title (e.g. "Fig 3 from the paper", "Downloaded from TreeBase Study S2914", etc) and should contain a Newick string that as closely matches the phylogeny in the paper as possible.
- Only the phylogeny that shows where clade definitions are expected to resolve needs to be curated. Other phylogenies may be included if the curator believes that they will help test phyloreference resolution.
- The curator should identify the expected nodes for each phyloreference to resolve to based on where the authors expected their clade definition to resolve on their phylogeny. Any differences should be noted in the curation notes field.
- Once all phylogenies and phyloreferences have been added, the curator should go through all phyloreferences to ensure that all specifiers that were expected to match are matching correctly. Additional taxonomic units may need to be added to the phylogeny to ensure that they match.
- The curator should also ensure that the expected node for each phyloreference is set.
- When a specifier does not match, the curator must click on the asterisk beside the specifier and set a reason for why this specifier does not match. Usually this will be because it is not present in any phylogeny, but any other reason can be provided here.
- Finally, the curator should note that they curated this PHYX file. Until we have a proper way to do this (see phyloref/curation-tool#26), they can leave a note in the curator notes fields for each phyloreference.
- The curator should document:
- the paper curated,
- time taken (both including and excluding time taken to obtain digital copies of the phylogenies),
- number of phyloreferences completely curated,
- number of phyloreferences incompletely curated, such as where a specifier is not shown on any phylogeny in the paper, and
- any issues which might have slowed down curation.
- The time taken must be accurate to within 10 minutes, and ideally should be accurate to the minute.
- Fisher et al (2007) Phylogeny of the Calymperaceae with a rank-free systematic treatment, curated as an example in the Curation Tool
- Hillis and Wilcox (2005) Phylogeny of the New World true frogs (Rana), curated as an example in the Curation Tool.
- Because we don't yet support matching of monomial names (issue phyloref/curation-tool#44)
-
Because we don't yet support multiple external specifiers (issue phyloref/curation-workflow#27:
- Carvalho-Sobrinho et al (2016) Revisiting the phylogeny of Bombacoideae (Malvaceae): Novel relationships, morphologically cohesive clades, and a new tribal classification based on multilocus phylogenetic analyses
- Crowl and Cellinese (2017) Naming diversity in an evolutionary context: Phylogenetic definitions of the Roucela clade (Campanulaceae/Campanuloideae) and the cryptic taxa within
- Tello et al (2009) Phylogeny and phylogenetic classification of the tyrant flycatchers, cotingas, manakins, and their allies (Aves: Tyrannides)
- Tank and Donoghue (2010) Phylogeny and Phylogenetic Nomenclature of the Campanulidae Based on an Expanded Sample of Genes and Taxa
- Brochu (2003) Phylogenetic approaches toward Crocodylian history
- Wojciechowski (2013) Towards a new classification of Leguminosae: Naming clades using non-Linnaean phylogenetic nomenclature
- Tree available on TreeBASE at Study 1509
- Contains an apomorphy-based definition along with node-based and branch-modified definitions.
- Wojciechowski et al (2004) A phylogeny of legumes (Leguminosae) based on analysis of the plastid matK gene resolves many well‐supported subclades within the family
- Doesn't have formal definitions, but describes the clades using MRCA-based definitions before discusses them
- Poe et al (2017) A Phylogenetic, Biogeographic, and Taxonomic study of all Extant Species of Anolis (Squamata; Iguanidae)
- Olmstead et al (2001) Disintegration of the Scrophulariaceae
- Kuntner (2006) Phylogenetic systematics of the Gondwanan nephilid spider lineage Clitaetrinae (Araneae, Nephilidae)
- Baron et al (2017) A new hypothesis of dinosaur relationships and early dinosaur evolution
- Cantino et al (1997) A Comparison of Phylogenetic Nomenclature with the Current System: A Botanical Case Study
- Cantino et al (2007) Towards a phylogenetic nomenclature of Tracheophyta
- Joyce et al (2004) Developing a Protocol for the Conversion of Rank-Based Taxon Names to Phylogenetically Defined Clade Names, As Exemplified by Turtles
- Carter et al (2015) The Paracladistic Approach to Phylogenetic Taxonomy
Funded by the US National Science Foundation through collaborative grants DBI-1458484 and DBI-1458604. See Funding for details.