Software Development Plan, December 2017

This is an archived copy of the software development plan Gaurav wrote up in December 2017. The plan itself is now being maintained as a Github project, and the overall list of project aims and goals has been moved to this Wiki, and should be edited there.

This software development plan is intended as to list all the software components we would like to build as part of the Phyloreferencing project. It will provide us with a comprehensive overview of current and future development efforts, organize development into phases, and allow us to precisely scope each component. Given that this is a research project, these plans (and especially the time estimates) should be generally considered extremely approximate and may change dramatically based at what we learn at each stage of development.

Software components

These have been entirely transferred to https://github.com/phyloref/organization/projects/1.

These project goals can be broken down into a series of software components. I summarize them here, and then provide a thorough description of each software component in the following sections. I measure progress on each component through two metrics:

Specific aims and goals are high-level requirements taken from the project proposal or added subsequently by project personnel. They are intended to be used as a checklist to show how close we are to meeting the overall expectations of our project.
User stories describe individual features that are necessary to make these software components usable and valuable to users, and show how well we understand upcoming requirements. They are intended to be converted to Github issues and form the basis of unit tests.

Component	Status	Specific aims	User stories	Complete by
Phyloreferencing test suite	In development	0/5	0/1	January 2018
Phyloreferencing Python library	In development	3/6	2/5	January 2018
Phyloreferencing Java library	Prototyping	2/4	0/3	January 2018
Phylogeny to OWL conversion tool	In development	1/4	1/4	January 2018
Phyloreferencing specification	Prototyping	4/22	0/12	March 2018
Phyloreference curation tool	→ January 2018	0/4	0/1	March 2018
Phyloreference interface for Regnum	→ March 2018	0/0	0/3	July 2018
Phyloreference navigator and query tool	→ March 2018	0/6	0/14	July 2018

The component statuses listed above are in the following sequence:

Not started: the estimated start date is listed.
Planning: the component is being scoped, either in this document or as a separate document or blog post.
Prototyping: a prototype software tool is being developed that acts as a proof-of-concept and suggests the best way to develop this tool.
In development: the prototype is being expanded to a fully documented, well-designed software program.
Testing: a potential release is ready and is being tested by users inside and outside the project. A comprehensive test suite might be added at this stage.
Complete: all planned development has taken place, the software has been tested and has been used by users outside of the project.

Goals and specific aims from the project proposal

Entirely moved to https://github.com/phyloref/organization/wiki/Project-aims-and-goals

The following goals of the Phyloreferencing project were listed in the project proposal (pg 3):

A specification for encoding phyloreferences and phylogenies in OWL.
1. We will specify and test templates and a supporting ontology for constructing phyloreferences in OWL, guided by phylogenetic definitions used in the literature.
  - Part of the Phyloreferencing test suite.
2. In parallel, we will develop a model and an automatic transformation tool for representing phylogenies in OWL such that phyloreferences can be resolved by OWL reasoning.
  - Part of the Phyloreference curation tool.
An OWL ontology of vetted phyloreferences.
1. To ground-proof the specification, we will create a tool to transform the published phylogenetic definitions contained in the RegNum database to an OWL ontology of phyloreferences.
  - Part of the Phyloreference interface for Regnum.
2. We will also supplement the RegNum content with phylogenetic definitions culled from the angiosperm (flowering plants) systematics literature.
  - Part of the Phyloreference interface for Regnum.
A proof-of-concept application for utility and correctness of phyloreferences.
1. Using a comprehensive phylogenetic tree for angiosperms and the previously curated phyloreferences, we will create an online application that uses OWL reasoning to allow users to query the tree using phyloreferences, and to find phyloreferences based on chosen nodes of the tree.
  - Part of the Phyloreference navigator and query tool.
A proof-of-concept application for navigating large-scale data resources.
1. We will extend the proof-of-concept application to allow users to query and navigate EOL with phyloreferences, using the full synthetic Open Tree of Life.
  - Part of the Phyloreference navigator and query tool.

The project proposal also includes a list of 12 specific aims (pg 13). These are:

(1a) Development of phyloreferencing ontology
(1b) Specification for ontology-based phyloreference construction
(1c) Tool for converting phylogenies into OWL ontologies
(2a) Ontology-based interface for RegNum
(2b) Literature extraction of phylogenetic definitions
(2c) Tool for transforming RegNum content to OWL
(3) Webapp for querying of large tree with phyloreferences
(4a) Algorithm to map between tree terminal nodes and taxonomies
(4b) Algorithm to map between tree internal nodes and taxonomies
Test cases, software testing, query result vetting
Development of online instructional module
Development of Museum exhibit

I listed the first nine nine specific aims under the descriptions of individual user components below. The final three specific aims are unrelated to any one software tool, and so are not included in this development plan.

Deliverables: Documents

Phyloreferencing specification

A specification that describes how phyloreferences can be defined conceptually and how they can be implemented in OWL. It includes examples of phyloreferences and discusses the limitations of this approach.

User stories (0/12):

Specific aims (4/22):

Deliverables: Software tools

Phyloreference navigator and query tool

Our overall goal is the development of a graphical tool that can be used to demonstrate the value of phyloreferencing when compared with current approaches. It will be a single, online tool that will allow users to:

Construct phyloreferences by indicating internal and external specifiers,
Provide validation of phyloreferences to identify common mistakes or help determine why a failing phyloreference has failed,
Match custom phylogenies against constructed phyloreferences or against the phyloreferences included into the Phyloreferencing test suite, and
Match custom and included phyloreferences against the Open Tree of Life.

While we have had some success is reasoning on-the-fly over small-to-medium sized phylogenies, demonstration phylogenies and phyloreferences will probably be prereasoned.

Gaurav’s stories (0/14):

Specific aims met (0/6):

(Specific Aim 3) Webapp for querying of large tree with phyloreferences (specifically, a large tree of angiosperms)
(Specific Aim 4a) Algorithm to map between tree terminal nodes and taxonomies
(Specific Aim 4b) Algorithm to map between tree internal nodes and taxonomies
(Goal 3a) Using a comprehensive phylogenetic tree for angiosperms and the previously curated phyloreferences, we will create an online application that uses OWL reasoning to allow users to query the tree using phyloreferences, and to find phyloreferences based on chosen nodes of the tree.
(Goal 4a) We will extend the proof-of-concept application to allow users to query and navigate EOL with phyloreferences, using the full synthetic Open Tree of Life.
Demonstrate the results of querying the tree by using it to navigate the Encyclopedia of Life

Phyloreferencing test suite

The phyloreferencing test suite consists of individual test suites consisting of groups of phyloreferences and phylogenies. The phylogenies have been annotated to indicate which nodes we expect will be resolved by each phyloreference. The test suite resolves the phyloreferences on the provided phylogenies and tests whether they resolve to the expected nodes. Since it acts as a test suite for the ontology itself, the definitive copy of the ontology should be stored with the test suite. The test suite will provide us with three main outputs:

A set of curated, documented phyloreferences side-by-side with phylogenies upon which they resolve,
A set of phyloreferences that we can include in the Phyloreferencing navigator and query tool, providing pre-created phyloreferences that can be resolved against custom phylogenies or against the Open Tree of Life synthetic tree, and
A regression testing platform that will detect if later changes to our software or ontology cause previously defined phyloreferences to fail or to resolve incorrectly.

User stories (0/5):

As someone curious about phyloreferences, I would:
- like a “worked example” showing how phyloreferences can be constructed, documented and published.
- like information on how many phyloreferences are part of our test suite, which branches of the tree of life are covered, and whether all current phyloreferences passed all tests
As an ontologist, I would like to:
- download the entire OWL test suite as an ontology in RDF/XML
- download a small subset of the OWL test suite as an ontology in RDF/XML
As a developer, I would like every existing curated phyloreference to be tested every time our ontology changes.

Specific aims met (0/1):

(Goal 1a) We will specify and test templates and a supporting ontology for constructing phyloreferences in OWL, guided by phylogenetic definitions used in the literature.

Phyloreference curation tool

This curation tool will allow journal articles containing phylogenetic clade definitions to be curated and added to the test suite. It will essentially act as a graphical editor to the JSON files in the Phyloreferencing test suite. Ideally, it will allow interactive editing of phyloreferences with immediate information on whether the phyloreference matched and why it failed.

User stories (0/4):

As a busy curator, I would like:
- Phyloreferences to be added quickly and with minimal metadata
- An interactive mode in which phyloreferences can be immediately executed and fixed
- Checklists to ensure that all necessary metadata were incorporated
As a developer, I would like to use this tool to prototype the Phyloreference navigator and query tool and to provide initial feedback on phyloreference-related user experience

Specific aims met (0/1):

(Specific Aim 2b) Literature extraction of phylogenetic definitions

Phylogeny to OWL conversion tool

phylo2owl will allow phylogenies to be converted into OWL, keeping as many annotations and labels as possible.

User stories (1/4):

As a user of command line tools,
- I should be able to install phylo2owl from the command line using pip install.
- I should be able to read documentation about this tool by running man phylo2owl.
- I should be able to read documentation about this tool by running phylo2owl --help.
As a software engineer, I would like to prevent code reuse by refactoring phylo2owl to use the Phyloreferencing Python library.

Specific aims met (1/4):

(Goal 1b) In parallel, we will develop a model and an automatic transformation tool for representing phylogenies in OWL such that phyloreferences can be resolved by OWL reasoning.
(Specific Aim 1c) Tool for converting phylogenies into OWL ontologies
Should be able to convert Newick, Nexus and NeXML input files.
Should have its own test suite

Phyloreference interface for Regnum

This tool provides close integration between Regnum, an online database of phylogenetic clade definitions, and our toolset. In particular, it is designed to facilitate the conversion of phyloreferences into OWL representations in the Phyloreference curation tool, where phyloreferences could be curated for inclusion into the test suite. This will probably involve making changes to Regnum to bring them more in line with the Phyloreferencing specification, which will be based on the Phenex phenotype curation tool and the Hymenoptera Anatomy Ontology (HAO) project.

User stories (0/0):

None so far

Specific aims met (0/3):

(Specific Aim 2a) Ontology-based interface for RegNum
(Specific Aim 2c) Tool for transforming RegNum content to OWL
- (Goal 2a) To ground-proof the specification, we will create a tool to transform the published phylogenetic definitions contained in the RegNum database to an OWL ontology of phyloreferences.
(Goal 2b) We will also supplement the RegNum content with phylogenetic definitions culled from the angiosperm (flowering plants) systematics literature.

Deliverables: Software libraries

Phyloreferencing Java library

Because of its integration with OWLAPI, the Java library can reason over ontologies to resolve phyloreferences. The library can therefore provide wrappers around that functionality, including identifying phyloreferences in an OWL ontology, resolving those phyloreferences to nodes, and reporting nodes that failed to resolve correctly. It may later include improved debugging tools and optimizations that allow for processing large phylogenies efficiently. While this library currently includes a command-line tool, jphyloref, this is not currently intended as a final product except as a part of other tools.

User stories (2/4):

As a software developer, I should be able to:
- Load and reason over an OWL ontology in RDF/XML.
- Identify nodes resolved by phyloreferences.
- Through abstraction, work directly with Phylogenies, Phyloreferences, Specifiers and other phyloreferencing elements.
- Determine why a phyloreference failed to resolve: check all its specifiers, identify cases where a “phyloreference” matched multiple nodes, and other common failure causes.

Specific aims met (0/3):

Provides a complete level of abstraction to phyloreferences in OWL ontologies
Is thoroughly documented
Includes a test suite

Phyloreferencing Python library

Given the difficulty of carrying out OWL reasoning in Python, this library is designed to be used to read and understand JSON files describing a test case with annotated phylogenies and phyloreferences. The phylogenies can be read as Newick, Nexus or NeXML formats, thanks to DendroPy. It contains the code required to convert phylogenies and phyloreferences into OWL representations, so that eventually phylo2owl will essentially be just a command line wrapper for this library.

User stories (2/5):

A software developer should be able to read a Phyloreference test suite written in JSON into memory.
A software developer should be able to export the Phyloreference test suite as OWL in JSON-LD.
A software developer should be able to list all the phylogenies and phyloreferences in a test suite.
A software developer should be able to obtain a list of nodes within a phylogeny (even if via DendroPy).
A software developer should be able to determine which nodes have been annotated as expected Phyloreference targets.

Specific aims met (3/6):

Reading JSON files describing a test case with annotated phylogenies and phyloreferences.
Reading phylogenies in Newick, Nexus or NeXML formats, thanks to DendroPy.
Writing out JSON-LD files that contain:
- The phylogenies, converted into an OWL representation, and
- The phyloreferences, converted into OWL expressions.
Is thoroughly documented
Includes a test suite

Funded by the US National Science Foundation through collaborative grants DBI-1458484 and DBI-1458604. See Funding for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software Development Plan, December 2017

Software components

Goals and specific aims from the project proposal

Deliverables: Documents

Phyloreferencing specification

User stories (0/12):

Specific aims (4/22):

Deliverables: Software tools

Phyloreference navigator and query tool

Gaurav’s stories (0/14):

Specific aims met (0/6):

Phyloreferencing test suite

User stories (0/5):

Specific aims met (0/1):

Phyloreference curation tool

User stories (0/4):

Specific aims met (0/1):

Phylogeny to OWL conversion tool

User stories (1/4):

Specific aims met (1/4):

Phyloreference interface for Regnum

User stories (0/0):

Specific aims met (0/3):

Deliverables: Software libraries

Phyloreferencing Java library

User stories (2/4):

Specific aims met (0/3):

Phyloreferencing Python library

User stories (2/5):

Specific aims met (3/6):

Clone this wiki locally