Skip to content

Migration Roadmap

Sean Kelly edited this page Aug 12, 2021 · 8 revisions

A brief roadmap to the future of the Early Detection Research Network (EDRN) portal, and for data-centric portals in general.

🏃‍♀️ Motivation

Plone is incredibly secure which is fantastic but it's also incredibly challenging to use. Major hurdles exist today for the EDRN portal:

  • The integration of data-rich statistical graphics.
    • There are plenty of off-the-shell visualization and interactivity libraries, notably D3 and Plotly Dash. These can be dropped into numerous content management systems but not Plone. Plone has its own, different, incompatible way of handling JavaScript and CSS add-ons.
    • We have abused iframes and pre-generated images to get the graphics we want so far, but this isn't an agile approach.
      • Plone especially handles iframes poorly because they come as concurrent HTTP requests, which slows down the entire page load.
  • The migration to Python 3. Python 2 reached end-of-life over a year ago, and the vulnerability scans are finding more and more problems.
    • We are at the point now, that to continue to make portal releases, we can no longer upgrade dependent packages and must surgically excise shared objects, libraries, and individual Python source files to satisfy the scans.
    • If the scan were to find a problem in Python 2 itself, we would have no way to proceed. This is a real possibility: security updates to Python 2 are no longer being published. If a vulnerability is discovered, it's game over.
    • Plone finally supports Python 3, but Plone uses a "no-SQL" hierarchical database that contains serialized Python objects, and the objects are completely incompatible between Python versions. The EDRN portal contains thousands of Python 2 objects.
      • In some rare cases it's possible to automate the migration from Python 2 to 3, but that assuredly doesn't apply to us. Why? Because the current Python 2 database contains dozens, perhaps hundreds of objects whose code no longer exists thanks to numerous upgrades over time, and this prevents migrations from proceeding.

🏁 Goals

Given the hurdles described above, we can at least enumerate some goals in a migration from Plone and the reasoning to do so as follows:

  • Open development to more people. Plone and Zope have such steep learning curves that it comes down to essentially one person on the Inforamtics Center team who is able to make progress.
  • Embrace future-looking technologies, including data graphics, advanced faceted and other modes of search, and interactivity, which are difficult if not impossible with Plone.
  • Use open standards such as relational databases instead of implementation-specific serialized object stores to help ensure future portability.
  • Remain on an upgrade path that avoids the use of end-of-life products and technologies.

🐦 Migration

Migrating away from Plone requires three phases with movement between each phase as obstacles are identified and uncertainties are narrowed. These phases are:

  1. Technology identification. Although developing a web-solution from scratch is attractive, leveraging the power, features, and security of an existing content management system or web application framework is a vital time-saver. So far we have opted to explore Wagtail for its track record with JPL-hosted solutions. It itself is based on the extremely popular Django web framework. As these are Python-based technologies, there is optimism to re-use existing code from the EDRN portal, especially the RDF parsing and ingestion frameworks.
  2. Prototyping. As part of the risk reduction approach, we plan on creating a prototype cancer data portal that meets the essential "go/no-go" decision points (see below). If for any reason the prototype fails to meet the criteria or demonstrates a non-functional requirement (such as difficulty of use similar to that of Plone) we'll return to step 1 and identify other technologies.
  3. Full migration. Once the prototype satisfies and demonstrates the requirements, we will proceed with a full migration. The intent is not to add any new features at this time, but change the technology stack.

🚀 Go/No-go Features

The following is a brief checklist of the core features we must have in order to proceed. As part of the risk reduction approach, if a solution can't satisfy all of these options, we must find another path. The core features are:

  • LDAP authentication and authorization.
  • Protection of pages based on roles and privileges accorded to login.
    • Group- and role-based access to certain portal sections.
    • Mixed public- and private- access to biomarker pages (this is critical; biomarkers are are public but certain details are private to certain groups).
  • RDF ingest of data for page population.
  • Import of existing Plone "static" pages, files, and images.
  • Statistical charts and data graphics.
  • Ability to perform interactive editing of page content.
  • Tech stack that is 100% beyond "end of life" status.
  • Maintainability by content editors.
  • Implementation of look-and-feel: NCI branding, Section 508, and 21st Century IDEA.

We are conducting a series of experiments to determine if our newly chosen platform can satisfy each of the go/no-go features listed above. We record the results of each experiment in the following section. If all of the experiments are successful, we will then begin the migration as described in the next section.

👩‍🔬 Experiments

TBD.

🗺 Workplan

Time and again the proven approach for the development of large-scale software projects is to first create a skeletal architecture that can support the basic core function of the discipline area and then build on that structure with bolt-on features that satisfy each requirement. This reduces risk further by enabling the architecture to be re-shaped early-on before too many bolt-ons make that impossible. Another lesson learned is that the in-development morphology of the project should match the deployment shape as closely as it can. This mitigates the need for numerous and painful r

Thus, the workplan for the portal migration is roughly linear and as follows:

  1. Creation of environmental context (automatic continuous integration/deployment of demonstration platform).
  2. Containerization of basic requirements (with identical morphology for development, demonstration, acceptance testing, and operations).
  3. Construction of initial architecture (and implementation within the container and context described in the immediately preceding steps) that includes a basic home page and look-and-feel.
  4. Log-in for users.
  5. RDF ingest for a basic knowledge object.
  6. Implementation and the knowledge environment.
  7. Import of the existing static pages.
  8. Import of binary large objects (PDFs and the like).

While look-and-feel may seem like an unimportant need to fulfill early in the workplan, the sad truth of human psyche is that we judge books by their covers; as a result, having a portal that looks like it would work in production early on is vital for acceptance by stakeholders.

Astute readers may notice there are no formal demonstrations in the workplan; this is because demonstrations are continuous and available throughout. Our continuous integration and continuous delivery system (Jenkins) will ensure that at least nightly—if not more frequently—deployed platforms are available for feedback.