-
Notifications
You must be signed in to change notification settings - Fork 4
AWS Snowcone Bucket Service
Eric Lopatin edited this page Jul 25, 2023
·
7 revisions
The Merritt team has established a common workflow for facilitating the ingest of content that is transferred from an AWS Snowcone and resides in a staging S3 bucket.
Many Merritt depositors provide an accessible URL that allows the Merritt System to download content and ingest that content into the Merritt repository.
When a depositor has a several TB of content on portable media that cannot easily be downloaded by Merritt, this workflow might be a good choice.
The temporary storage utilized by this workflow creates incremental costs for the Merritt system. The steps within this workflow should be implemented expeditiously to reduce costs and to ensure that digital content has been properly preserved.
- The Merritt Team configures a Snowcone deposit project within Merritt.
- The project consists of the following components
- An S3 storage bucket for the content.
- A web service that will allow the Merritt ingest service to retrieve the content.
- A corresponding Merritt collection if one does not already exist.
- The project consists of the following components
- The Merritt Team orders a Snowcone device from AWS. The device is sent to the depositor.
- The depositor copies copies content to the Snowcone device by using of the AWS OpsHub application.
- Note the Snowcone device must be returned to AWS after 5 days of usage, otherwise an additional daily fee is incurred.
- The depositor must keep an inventory of the files copied to the device. This inventory will be utilized to generate an ingest manifest for Merritt.
- The Snowcone device is returned to AWS.
- AWS will copy content from the Snowcone device to the specified S3 Storage Bucket.
- The depositor will submit a series of ingest manifests to Merritt. The ingest manifests will organize the files into Merritt Objects and provide metadata and identifiers for each object.
- Once the Merritt team and the depositor have confirmed that the content has been ingested into Merritt, the contents of the Snowcone storage bucket will be purged.
- The Merritt Team will provide the depositor with a URL for the storage bucket. This URL will used when generating Ingest Manifests for Merritt.
- This URL will point to a web service that will confirm the existence of the storage bucket and will provide some high level details about the bucket contents. The retrieval of content from the bucket will require credentials.
- The Merritt Team will provide credentials for retrieving content from the bucket. These credentials should be added to the URL's used in the Ingest manifest.
curl -L https://username:[email protected]/key-name
- The requested URL will return a redirect that will allow the content to be downloaded.
Sample Manifest
#%checkm_0.7
#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-ingest-manifest
#%prefix | mrt: | http://merritt.cdlib.org/terms#
#%prefix | nfo: | http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#
#%fields | nfo:fileurl | nfo:hashalgorithm | nfo:hashvalue | nfo:filesize | nfo:filelastmodified | nfo:filename | mrt:mimetype
https://user:[email protected]/foo.bar | | | | | foo.bar |
#%eof