Migration Roadmap

A brief roadmap to the future of the Early Detection Research Network (EDRN) portal, and for data-centric portals in general.

🏃‍♀️ Motivation

Plone is incredibly secure which is fantastic but it's also incredibly challenging to use. Major hurdles exist today for the EDRN portal:

The integration of data-rich statistical graphics.
- There are plenty of off-the-shell visualization and interactivity libraries, notably D3 and Plotly Dash. These can be dropped into numerous content management systems but not Plone. Plone has its own, different, incompatible way of handling JavaScript and CSS add-ons.
- We have abused iframes and pre-generated images to get the graphics we want so far, but this isn't an agile approach.
  - Plone especially handles iframes poorly because they come as concurrent HTTP requests, which slows down the entire page load.
The migration to Python 3. Python 2 reached end-of-life over a year ago, and the vulnerability scans are finding more and more problems.
- We are at the point now, that to continue to make portal releases, we can no longer upgrade dependent packages and must surgically excise shared objects, libraries, and individual Python source files to satisfy the scans.
- If the scan were to find a problem in Python 2 itself, we would have no way to proceed. This is a real possibility: security updates to Python 2 are no longer being published. If a vulnerability is discovered, it's game over.
- Plone finally supports Python 3, but Plone uses a "no-SQL" hierarchical database that contains serialized Python objects, and the objects are completely incompatible between Python versions. The EDRN portal contains thousands of Python 2 objects.
  - In some rare cases it's possible to automate the migration from Python 2 to 3, but that assuredly doesn't apply to us. Why? Because the current Python 2 database contains dozens, perhaps hundreds of objects whose code no longer exists thanks to numerous upgrades over time, and this prevents migrations from proceeding.

🏁 Goals

Given the hurdles described above, we can at least enumerate some goals in a migration from Plone and the reasoning to do so as follows:

Open development to more people. Plone and Zope have such steep learning curves that it comes down to essentially one person on the Inforamtics Center team who is able to make progress.
Embrace future-looking technologies, including data graphics, advanced faceted and other modes of search, and interactivity, which are difficult if not impossible with Plone.
Use open standards such as relational databases instead of implementation-specific serialized object stores to help ensure future portability.
Remain on an upgrade path that avoids the use of end-of-life products and technologies.

🐦 Migration

Migrating away from Plone requires three phases with movement between each phase as obstacles are identified and uncertainties are narrowed. These phases are:

Technology identification. Although developing a web-solution from scratch is attractive, leveraging the power, features, and security of an existing content management system or web application framework is a vital time-saver. So far we have opted to explore Wagtail for its track record with JPL-hosted solutions. It itself is based on the extremely popular Django web framework. As these are Python-based technologies, there is optimism to re-use existing code from the EDRN portal, especially the RDF parsing and ingestion frameworks.
Prototyping. As part of the risk reduction approach, we plan on creating a prototype cancer data portal that meets the essential "go/no-go" decision points (see below). If for any reason the prototype fails to meet the criteria or demonstrates a non-functional requirement (such as difficulty of use similar to that of Plone) we'll return to step 1 and identify other technologies.
Full migration. Once the prototype satisfies and demonstrates the requirements, we will proceed with a full migration. The intent is not to add any new features at this time, but change the technology stack.

🚀 Go/No-go Features

The following is a brief checklist of the core features we must have in order to proceed. As part of the risk reduction approach, if a solution can't satisfy all of these options, we must find another path. The core features are:

LDAP authentication and authorization.
Protection of pages based on roles and privileges accorded to login.
- Group- and role-based access to certain portal sections.
- Mixed public- and private- access to biomarker pages (this is critical; biomarkers are are public but certain details are private to certain groups).
RDF ingest of data for page population.
Import of existing Plone "static" pages, files, and images.
Statistical charts and data graphics.
Ability to perform interactive editing of page content.
Tech stack that is 100% beyond "end of life" status.
Maintainability by content editors.

🗺 Workplan

TBD.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Provide feedback

Saved searches