To help analyst teams implement RAP, we've created a maturity framework we call the 'levels of RAP'.
There are three levels:
- Baseline - RAP fundamentals offering resilience against future change.
- Silver - Implementing best practice by following good analytical and software engineering standards.
- Gold - Analysis as a product to further elevate your analytical work and enhance its reusability to the public.
The levels of RAP maturity framework helps teams balance BAU delivery, resourcing constraints and other organisational objectives alongside RAP implementation.
The Baseline level is designated as the minimum standard of a RAP.
Silver and Gold RAP help external users reuse code and support moving to a fully-automated solution. However they often require significant time and investment in upskilling.
Many teams will aim for Baseline RAP in their first RAP attempt, and aim for Silver or Gold RAP for further releases. The different levels allow teams to tackle capabilities iteratively, rather than all at once. Initially aiming for Baseline or Silver RAP means that teams see some of the immediate benefits of RAP while having the time needed to learn new skills.
It's important to balance the work required to implement RAP and the benefit you'll get (e.g. time & resource-saving); in other words, match the RAP level to the product. For example, some ad-hoc analysis products probably do not need to aim for silver or gold RAP.
The levels of RAP are the outcome of discussions with the cross-government RAP group, with input from team leads, the head of the Analytics Function, the head of Data Science, the Chief Statistician, the Statistics Regulator at NHS Digital.
In order for a publication to be considered a reproducible analytical pipeline, it must at least meet all of the requirements of Baseline RAP:
- Data produced by code in an open-source language (e.g., Python, R, SQL).
- Code is version controlled (see Git basics and using Git collaboratively guides).
- Repository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code (use NHS Open Source Policy section on Readmes as a guide.
- Code has been peer reviewed.
- Code is published in the open and linked to & from accompanying publication (if relevant).
Meeting all of the above requirements, plus:
- Outputs are produced by code with minimal manual intervention.
- Code is well-documented including user guidance, explanation of code structure & methodology and docstrings for functions.
- Code is well-organised following standard directory format.
- Reusable functions and/or classes are used where appropriate.
- Code adheres to agreed coding standards (e.g PEP8, style guide for Pyspark).
- Pipeline includes a testing framework (unit tests, back tests).
- Repository includes dependency information (e.g. requirements.txt, PipFile, environment.yml).
- Logs are automatically recorded by the pipeline to ensure outputs are as expected.
- Data is handled and output in a Tidy data format.
Meeting all of the above requirements, plus:
- Code is fully packaged.
- Repository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions.
- Process runs based on event-based triggers (e.g., new data in database) or on a schedule.
- Changes to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)