yaz-marcdump
can be used to tranform MARC data between different serializations. Use options -i
and -o
to specfiy the input and output formats.
"MARC (ISO 2709)" to "MARC XML":
$ yaz-marcdump -i marc -o marcxml code4lib.mrc > code4lib.xml
"MARC (ISO 2709)" to "Turbomarc":
$ yaz-marcdump -i marc -o turbomarc code4lib.mrc > code4lib.turbo.xml
"MARC (ISO 2709)" to "MARC Line":
$ yaz-marcdump -i marc -o line code4lib.mrc > code4lib.line
"MARC XML" to "MARC-in-JSON":
$ yaz-marcdump -i marcxml -o json code4lib.mrc.xml > code4lib.json
The command-line interface of the Catmandu toolkit also offers several tranformations of MARC data. The default MARC serialization is "MARC (ISO 2709)".
"MARC (ISO 2709)" to "MARC XML":
$ catmandu convert MARC to MARC --type XML < code4lib.mrc > code4lib.xml
"MARC XML" to "MARC (ISO 2709)":
$ catmandu convert MARC --type XML to MARC < code4lib.xml > code4lib.mrc
"MARC (ISO 2709)" to "MARC Line":
$ catmandu convert MARC to MARC --type Line < code4lib.mrc > code4lib.line
"MARC (ISO 2709)" to "MARCMaker":
$ catmandu convert MARC to MARC --type MARCMaker < code4lib.mrc > code4lib.mrk
"MARC XML" to "MARC-in-JSON":
$ catmandu convert MARC --type XML to MARC --type MiJ < code4lib.xml > code4lib.json
"MARC XML" to YAML:
$ catmandu convert MARC to YAML < code4lib.mrc > code4lib.yml
The Catmandu::Breaker module "breaks" data into smaller components and exports them line by line:
$ catmandu convert MARC to Breaker --handler marc < code4lib.mrc
987874829 LDR 01031nas a2200337 c 4500
987874829 001 987874829
987874829 003 DE-101
987874829 005 20200306093601.0
987874829 007 cr||||||||||||
987874829 008 080311c20079999|||u||p|o ||| 0||||1eng c
987874829 0162 DE-101
987874829 016a 987874829
987874829 0162 DE-600
987874829 016a 2415107-5
987874829 022a 1940-5758
987874829 035a (DE-599)ZDB2415107-5
...
You can process this output with other command-line utilities like grep
, sort
and uniq
. For example, to extract all ISBN from a MARC data sets, we can build a command-line pipeline like this:
$ catmandu convert MARC to Breaker --handler marc < loc.mrc \
| grep -P '\t020a' | cut -f 3 | grep -oP '^[\dX]+' | sort | uniq -c
1 0072123397
1 0130284181
1 0201422190
1 0470176431
2 0596002270
...
With Catmandu you can export data to generic data formats like CSV, JSON, TSV, XLSX and YAML. MARC serializations are "complex/nested data structures" which cannot be stored in flat data structures like tables.
You can export MARC records to nested formats like JSON and YAML:
$ catmandu convert MARC to YAML < code4lib.mrc
$ catmandu convert MARC to JSON < code4lib.mrc
This will not work:
$ catmandu convert MARC to CSV < code4lib.mrc
$ catmandu convert MARC to TSV < code4lib.mrc
$ catmandu convert MARC to XLSX < code4lib.mrc
You need to use "Catmandu::Fix" to extract and map your data to a tabular data structure:
$ catmandu convert MARC to CSV \
--fix 'marc_map(245abc,dc_title,join:" ");retain_field(dc_title)' \
< code4lib.mrc
If you want transform MARC records to other formats, you have to map MARC (sub)fields to corresponding fields of the other format. The Libary of Congress provides several crosswalks:
Based on these crosswalks the Library of Congress published several XLS stylesheets, which can be used with a XSLT processor to transform "MARC XML" records to other formats like BIBFRAME, HTML, MODS, OAI-DC and RDF.
"MARC XML" to MODS:
$ xsltproc MARC21slim2MODS3-7.xsl loc.mrc.xml > loc.mods.xml
"MARC XML" to HTML:
$ xsltproc MARC21slim2HTML.xsl loc.mrc.xml > loc.html
"MARC XML" to "OAI-DC":
$ xsltproc MARC21slim2OAIDC.xsl loc.mrc.xml > loc.oaidc.xml
"MARC XML" to "RDF-DC":
$ xsltproc MARC21slim2RDFDC.xsl loc.mrc.xml > loc.rdfdc.xml
"MARC XML" to "BIBFRAME":
$ xsltproc bibframe-xsl/marc2bibframe2.xsl loc.mrc.xml > loc.bibframe.xml