Skip to content

Architecture

David Moles edited this page Mar 8, 2018 · 15 revisions

Merritt's architecture consists of several primary microservices, as well as various off-the-shelf components, secondary services, and external systems, in addition to multiple remote storage providers. An architecture diagram is provided at the end of this document.

Primary microservices

Merritt consists of five primary microservices: Ingest, Storage, Inventory, Replication, and Audit.

  • The Ingest service manages the acquisition of new digital content.
  • The Storage service manages the secure and persistent storage of digital content.
  • The Inventory service provides a comprehensive catalog of information known about digital objects, versions, collections, and owners.
  • The Replication service manages the synchronization of content replicas across redundant storage locations.
  • The Audit service manages the ongoing bit-level verification of digital content.

Each service is deployed on multiple servers for performance and fault-tolerance.

Off-the-shelf components

Merritt uses several off-the-shelf components to support, coordinate, and share data among the various primary services. These include an OpenDJ LDAP server for authentication and authorization, a ZooKeeper queue for processing ingested content into the Inventory service, and a MySQL database maintaining inventory, audit, and replication information.

In addition, an Apache web server (not shown in the diagram) acts as front-end and load balancer, forwarding external requests to the UI and internal HTTP requests between the various services.

Secondary services

Several secondary services facilitate access to the digital content stored in Merritt, including the Local ID service, SWORD server, OAI-PMH server, UI, and Merritt Express.

  • The Local ID maps external secondary ("local") identifiers such as DOIs to the ARKs used as Merritt primary identifiers.
  • The SWORD server implements a subset of the SWORD 2.0 deposit specification, and is used to accept deposits from external systems such as Dash.
  • The OAI-PMH server provides an OAI-PMH feed allowing external systems to harvest metadata from Merritt collections.
  • The UI is a Ruby on Rails application that provides the primary user interface to Merritt.
  • Merritt Express is a lightweight, high-performance, file-level download service based on the Apache web server and various Apache modules, used primarily by eScholarship.

External systems

Merritt is a DataONE Member Node, and publishes metadata to DataONE using an installation of the DataONE Generic Member Node software package.

Remote storage

All digital content deposited in Merritt is written to remote external storage for preservation, either in Amazon S3, at the San Diego Supercomputer Center, or (for DataONE) at the University of New Mexico. The primary copies are then replicated to secondary storage locations for redundancy.

(Content in S3 and at SDSC is transparently replicated on multiple servers, providing additional redundancy.)

Diagram

(click image to view full size; or download PDF)

Merritt Architecture

Clone this wiki locally