Skip to content

ocfl-archive/gocfl

Repository files navigation

GOCFL

Go OCFL implementation.

Installation

Via go ecosystem

go install github.com/ocfl-archive/gocfl/v2/gocfl@latest

Via GitHub repository

  • navigate to gocfl directory (you should see main.go).
  • run go tidy to update local dependencies.
  • run go build to create a locally compiled gocfl binary.

Configuration

GOCFL relies on a configuration file to activate, among other things, indexing, and migration capabilities of the GOCFL tool.

A simplified configuration file can be found at: at gocfl2.toml.

Pointing to a custom configuration

GOCL will be compiled with an embedded configuration file. GOCL expects this configuration to be at ./config/default.toml. The configuration can be overwritten and compiled. Optionally, you can supply your own configuration file, e.g. for an add command:

./gocfl add \
 ./storage_root /tmp/ocfltest1/ \
 -u "Jane Doe" \
 -a "mailto:user@domain" \
 -m "initial add" \
 --object-id 'id:abc123' \
 --config custom-config.toml

Additional tools

GOCFL is optimised next to the following Windows utilities and you will find refeences to them under Indexer and Migration settings in the config toml file:

  • convert.exe via ImageMagick
  • identify.exe via ImageMagick
  • ffmpeg.exe via FFmpeg
  • ffprobe.exe via FFmpeg
  • gswin64.exe via Ghostscript
  • powershell.exe via Windows Powershell

With the exception of Powershell (discussed below) you should be able to find drop-in replacements in 'nix-like systems, e.g. convert.exe becomes convert in Linux identify.exe becomes identify and gswin64 becomes gs.

Use of Powershell

Powershell scripts are currently used to generate thumbnails for video and pdf. They are found in the ./data/scripts folder. You can observe their functionality to write equivalents for your own operating systen's shell.

Invoking indexing and migration

Previous GOCFL implementations required a flag to invoke indexing. Now GOCFL must be compiled with the ObjectExtensions setting configured in the configuration toml. This line can be optionally commented out. It will look like as follows:

[Add]

ObjectExtensions="./data/fullextensions/object"

Providing the additional tools are configured correctly, and their function set to Enabled=true in the config, they will run during GOCFL activities such as add.

Go OCFL Implementation

This library supports the Oxford Common Filesystem Layout (OCFL) and focuses on creation, update, validation and extraction of OCFL StorageRoots and Objects.

GOCFL command line tool supports the following subcommands

There's a quickstart guide available.

Why GOCFL

There are several OCFL tools & libraries that already exist. This software is built with the following motivation:

  • I/O performance.
  • Containers.
  • Encryption.
  • Extensions.
  • Indexing.

I/O Performance

Regarding performance, Storage I/O generates the main performance issues. Therefore, every file should be read and written only once. Only in case of deduplication, the checksum of a file is calculated before ingest and a second time while ingesting.

Containers

Serialization of an OCFL Storage Root into a container format like ZIP must not generate overhead on disk I/O. Therefor generation of an OCFL Container is possible without an intermediary OCFL Storage Root on a filesystem.

Encryption

For storing OCFL containers in low-security locations (cloud storage, etc.), it's possible to create an AES-256 encrypted container on ingest.

Extensions

The extensions described in the OCFL standard are quite open in their functionality and may belong to the Storage Root or Object. Since there's no specification of a generic extension api, it's difficult to integrate specific extension hooks into other libraries. This library identifies 7 different extension hooks so far.

Indexer

When content is ingested into OCFL objects, technical metadata should be extracted and stored alongside the manifest data. This allows technical metadata to be extracted alongside the content. Since the OCFL structure is quite rigid, there's a need for a special extension to support this.

GOCFL Functionality

  • Supports local filesystems
  • Supports S3 Cloud Storage (via MinIO Client SDK)
  • SFTP Storage
  • Google Cloud Storage
  • Serialization into ZIP Container
  • AES Encryption of Container
  • Supports mixing of source and target storage systems
  • Non blocking validation (does not stop on validation errors)
  • Support for OCFL v1.0 and v1.1
  • Documentation for API
  • Digest Algorithms for Manifest: SHA512, SHA256
  • Fixity Algorithms: SHA1, SHA256, SHA512, BLAKE2b-160, BLAKE2b-256, BLAKE2b-384, BLAKE2b-512, MD5
  • Concurrent checksum generation on ingest/extract (multi-threaded)
  • Minimized I/O (data is read and written only once on Object creation)
  • Update strategy echo (incl. deletions) and contribute
  • Deduplication (needs double read of all content files, switchable)
  • Nearly full coverage of validation errors and warnings
  • Content information
  • Extraction with version selection
  • Display of content via Webserver
  • Report generation
  • Community Extensions
    • 0001-digest-algorithms
    • 0002-flat-direct-storage-layout
    • 0003-hash-and-id-n-tuple-storage-layout
    • 0004-hashed-n-tuple-storage-layout
    • 0005-mutable-head
    • 0006-flat-omit-prefix-storage-layout
    • 0007-n-tuple-omit-prefix-storage-layout
    • 0008-schema-registry
  • Local Extensions

Command Line Interface

A fast and reliable OCFL creator, extractor and validator.
https://github.com/ocfl-archive/gocfl
Jürgen Enge (University Library Basel, [email protected])
Version v2.0.6

Usage:
  gocfl [flags]
  gocfl [command]

Available Commands:
  add         adds new object to existing ocfl structure
  completion  Generate the autocompletion script for the specified shell
  create      creates a new ocfl structure with initial content of one object
  display     show content of ocfl object in webbrowser
  extract     extract version of ocfl content
  extractmeta extract metadata from ocfl structure
  help        Help about any command
  init        initializes an empty ocfl structure
  stat        statistics of an ocfl structure
  update      update object in existing ocfl structure
  validate    validates an ocfl structure

Flags:
      --config string                 config file (default is embedded)
  -h, --help                          help for gocfl
      --log-file string               log output file (default is console)
      --log-level string              log level (CRITICAL|ERROR|WARNING|NOTICE|INFO|DEBUG)
      --s3-access-key-id string       Access Key ID for S3 Buckets
      --s3-endpoint string            Endpoint for S3 Buckets
      --s3-region string              Region for S3 Access
      --s3-secret-access-key string   Secret Access Key for S3 Buckets

Use "gocfl [command] --help" for more information about a command.