The National Archives and Records Administration Digital Preservation Framework consists of a Risk Matrix and Preservation Action Plans.
The National Archives 2022–2026 Strategic Plan embraces the primacy of electronic records. Our vision is to ensure cutting-edge access to extraordinary volumes of government information and unprecedented engagement to bring greater meaning to the American experience. To do so, NARA must collaborate with other Federal agencies, the private sector, and the public to ensure records and archives thrive in a digital world.
Digital preservation is critical to this work. It has become even more important because of the direction (M-23-07, Update to Transition to Electronic Records) to Federal agencies to transition business processes and recordkeeping to a fully electronic environment. As of June 30, 2024, the National Archives no longer accepts paper records from Federal agencies, with limited exceptions.
NARA holds over one billion files representing more than 700 file format versions. These files can be categorized into 16 general categories of electronic records. The vast majority of files are email messages, JPEG and TIFF still images, PDFs, HTML, and plain ASCII text.
The NARA Digital Preservation Unit created the Digital Preservation Framework in response to the current and anticipated volume of electronic records in NARA’s holdings. This set of documents describes how NARA identifies risks to digital files and prioritizes them for action, as well as how NARA plans for their long-term preservation.
The Digital Preservation Framework consists of:
- The NARA Risk Matrix
- File Format Preservation Action Plans
- Record Category Preservation Action Plans
These documents are actively maintained and updated on a quarterly basis, as NARA continues to research formats newly identified or newly accessioned into its holdings.
NARA uses the Risk Matrix to measure the preservation risk of digital file formats in our holdings and to assess formats we anticipate receiving in the future. By answering questions related to the ability to preserve and sustain a file format, we identify relative risk levels.
The Risk Matrix is structured as a series of twenty-seven questions about each file format, organized by eight categories relating to risk and sustainability:
- Disclosure
- Adoption
- Transparency
- Self-Documentation
- External Hardware Dependencies
- External Software Dependencies
- Impact of Patents
- Technical Protection Mechanisms
The answers to all the questions have been assigned numeric values, which are used to calculate an overall numeric Risk Rating and a general Risk Level which translates to: Low Risk, Moderate Risk, and High Risk.
The final questions in the Risk Matrix represent how NARA prioritizes formats in our holdings for preservation actions. We use Need/Use/Feasibility to determine our preservation priorities. The Risk Rating represents the "Need" for a preservation action. "Use" is represented by evaluating Prevalence: how common the format is in our holdings at the time of assessment, therefore approximating the level of use of the format in the permanent records of the Federal Government. "Feasibility" is measured as the capacity for NARA to process and convert formats. We assess Feasibility based on the general availability of tools for format migration that do not alter the content in unacceptable ways as well as our capacity to perform acceptable migrations.
The Risk Matrix is described in greater detail here.
NARA publishes File Format Preservation Action Plans for all file formats represented in the Risk Matrix. These Plans identify actions that would enhance long-term preservation for a format type and documents NARA’s practices and preferred tools. However, they are not exhaustive nor universally applicable; actions may differ based on the file formats and variant versions in NARA holdings, the current NARA risk assessment, processing capabilities, and tools in use at NARA. These Plans apply to files once they have been deemed permanent for NARA's holdings; the appraisal guidelines for when a record is permanent is different for Congressional, Federal, and Presidential records.
These plans can be accessed in the File Format Preservation Action Plan Spreadsheet. The spreadsheet covers over 700 variant versions of file formats and identifies:
- Categories of electronic records associated with the format
- Specifications, standards, and documentation where possible; some have no specification or standard available
- Proposed preservation migration actions to be taken by NARA, including no action when appropriate
- Recommended tools for processing and preservation actions
The Preservation Action Plans are also available as linked open data on archives.gov.
NARA also maintains preservation action plans for categories of electronic records. Each category has its own Plan that contains a list of “Significant Properties,” which identify the properties, or characteristics, of a record (its Appearance, Behavior, Context, and Structure) that should be retained, if possible, in any format migration. These characteristics are important to ensure the highest fidelity format record migrations. The Record Category Preservation Action Plans, and a template for their creation, can be accessed here.
The 16 categories are:
- Digital Audio
- Digital Design and Vector Graphics
- Digital Still Image
- Geospatial
- Moving Image: Digital Cinema
- Moving Image: Digital Video
- Navigational Charts
- Presentation and Publishing
- Software and Code
- Structured Data: Calendars
- Structured Data: Databases
- Structured Data: Generic
- Structured Data: Spreadsheets
- Textual And Word Processing
- Web Records
NARA also publishes file extensions data from our holdings for potential reuse, such as to identify extensions to be researched further for file format identification. The extensions and counts provided are generated by combining reports from various preservation systems for Federal, Legislative, and Presidential electronic records. More information and the dataset can be accessed here.
There are several related resources available from NARA about file formats:
- Digital Preservation Framework Linked Open Data (File Format Preservation Action Plans as linked data)
- Transfer Guidance on Preferred and Acceptable File Formats (file formats that NARA prefers to receive from agencies, but which is not 100% proscriptive)
- NARA Digitization Products and Service (file formats that NARA produces in its own internal digitization)
- NARA Digital Preservation Strategy (NARA’s approach to Digital Preservation)
All Risk is Local: File Format Risk Assessment In Two U.S. Government Contexts. iPres International Digital Preservation Conference, 2024.
Creating a holdings format profile and format matrix for risk-based digital preservation planning at the National Archives and Records Administration. iPres International Digital Preservation Conference, 2018.
We share these documents to be transparent about our approach to digital preservation and to communicate our current practices to Federal agencies, records managers, archivists, digital preservation professionals, researchers, private industry, allied professionals, and members of the public.
We always welcome feedback on the following topics:
- What revisions can you suggest to the proposed processing and preservation actions for the formats?
- Are the Significant Properties for each category comprehensive enough for digital preservation?
- Are the proposed preservation actions for the formats technically appropriate?
- Are there appropriate tools for processing and preservation migrations of specific formats that we have not listed? Are any of the listed tools inappropriate or risky to use for the purposes we have identified?
- Are there other formats we have not yet identified that need plans?
Please use the issues feature on this site to leave a specific comment or question or to just start a discussion. You can read more about how to contribute here. NARA staff will respond as quickly as they can.
We update the matrix and plans on an ongoing basis in response to changing risks and new technologies and formats.