a data toolkit

This handbook is contains the aggregated content of Catmandu documentation wiki. Feel free to improve the documentation there!

1 Introduction

Catmandu is a command line tool to access and convert data from your digital library, research services or any other open data sets. The toolkit was originally developed as part of the LibreCat project and attracts now an international development team with many participating institutions.

Catmandu has the following features, one can:

Catmandu is used in the LibreCat project to build institutional repositories and search engines. Catmandu is used on the command line for quick and dirty reports but also as part of larger programming projects processing millions of records per day. For a short overview of use-cases, see our Homepage.

There are than 60 Catmandu projects available at GitHub LibreCat.

2 Installation

To get Catmandu running on your system you need to download and install at least the CPAN Catmandu module. Additional modules add support for more input and output formats, databases, and processing options.

To install Catmandu modules select at least Catmandu (and probably Catmandu::MARC, Catmandu::OAI, Catmandu::RDF, Catmandu::XLS)

$ sudo cpanm Catmandu Catmandu::MARC

To install extra Catmandu modules at any point in time, the cpanm command needs to be used.

$ sudo cpanm  Catmandu::OAI
$ sudo cpanm  Catmandu::RDF
$ sudo cpanm  Catmandu::Store::MongoDB
$ sudo cpanm  Catmandu::XLS

To make full usage of the capabilities of Catmandu, database and search engines such as MongoDB, Elasticsearch, Solr, Postgres, MySQL can be installed on the system with the corresponding Catmandu tools. How to install these database on your local system falls outside the scope of this documentation. Please consult the installation guide of the database product for more information. For more information on the available Catmandu packages consult our Distributions list.

Here are some Catmandu installation hints for various platforms.

2.0.1 Debian

Several Catmandu packages are officially included in Debian but not all (see Voting Catmandu packages to be included in Debian and this search of currently available packages).

You can install all packages officially included in Debian:

sudo apt-get update
sudo apt-get install libcatmandu*-perl

Alternatively, you can build newest Catmandu and dependencies from source:

sudo apt-get update
sudo apt-get install cpanminus build-essential libexpat1-dev libssl-dev libxml2-dev libxslt1-dev libgdbm-dev
cpanm Catmandu Catmandu::MARC

Alternatively, you can build newest Catmandu as unofficial packages, using most possible official packages:

sudo apt update
sudo apt install dh-make-perl liblocal-lib-perl apt-file
sudo apt-file update
sudo apt install libtest-fatal-perl libmodule-build-tiny-perl libmoo-perl libmodule-pluggable-perl libcapture-tiny-perl libclass-load-perl libgetopt-long-descriptive-perl libio-tiecombine-perl libstring-rewriteprefix-perl libio-handle-util-perl
cpan2deb --vcs '' MooX::Aliases
cpan2deb --vcs '' Log::Any
cpan2deb --vcs '' App::Cmd
cpan2deb --vcs '' LaTeX::ToUnicode
cpan2deb --vcs '' PICA::Data
cpan2deb --vcs '' LV
cpan2deb --vcs '' MODS::Record
sudo dpkg -i lib*-perl_*.deb
cpan2deb --vcs '' BibTeX::Parser
sudo dpkg -i libbibtex-parser-perl_*.deb
sudo apt install libexporter-tiny-perl
cpan2deb --vcs '' JSON::Path
sudo dpkg -i libjson-path-perl_*.deb
cpan2deb --vcs '' JSON::Hyper
sudo dpkg -i libjson-hyper-perl_*.deb
sudo apt install libhttp-link-parser-perl libautovivification-perl libmatch-simple-perl
cpan2deb --vcs '' JSON::Schema
sudo dpkg -i libjson-schema-perl_*.deb
sudo apt install libjson-xs-perl libtest-exception-perl libtest-deep-perl libfile-slurp-tiny-perl liburi-template-perl libtry-tiny-byclass-perl libdata-util-perl libdata-compare-perl libhash-merge-simple-perl libthrowable-perl libclone-perl libdata-uuid-perl libmarpa-r2-perl libconfig-onion-perl libmodule-info-perl libtext-csv-perl libcgi-expand-perl
dh-make-perl --vcs '' --cpan Catmandu
perl -i -pe 's/libossp-uuid-perl[^,\n]*/libdata-uuid-perl/g' libcatmandu-perl/debian/control
( cd libcatmandu-perl && dpkg-buildpackage -b -us -uc -d )
sudo dpkg -i libcatmandu-perl_*.deb
dh-make-perl --vcs '' --cpan Catmandu::Twitter
perl -i -pe 's/liburi-perl\K[^,\n]*//g' libcatmandu-twitter-perl/debian/control
( cd libcatmandu-twitter-perl && dpkg-buildpackage -b -us -uc -d )
sudo apt install libchi-perl libnet-ldap-perl libdatetime-format-strptime-perl libxml-libxslt-perl libxml-struct-perl libnet-twitter-perl libxml-parser-perl libspreadsheet-xlsx-perl libexcel-writer-xlsx-perl libdevel-repl-perl libio-pty-easy-perl
cpan2deb --recursive --vcs '' Task::Catmandu
sudo apt install 'libcatmandu-*'
sudo dpkg -i libcatmandu-twitter-perl_*.deb
sudo dpkg -i ~/.cpan/build/libcatmandu-*-perl_*.deb

2.0.2 Ubuntu Server 12.04.4 LTS

apt-get install make
apt-get install libmodule-install-perl
apt-get install libyaz-dev
apt-get install libwrap0-dev
apt-get install libxml2-dev zlib1g zlib1g-dev
apt-get install libexpat1-dev
apt-get install libxslt1-dev
apt-get install libssl-dev
apt-get install libgdbm-dev
apt-get install perl-doc
yes | cpan Test::More
yes | cpan YAML
yes | cpan App::cpanminus
/usr/local/bin/cpanm Catmandu Catmandu::MARC

2.0.3 CentOS 6.4

yum groupinstall "Development Tools"
yum install perl-ExtUtils-MakeMaker
yum install perl-CPAN -y
yum install gcc -y
yum install gdbm gdbm-devel -y
yum install openssl-devel -y
yum install tcp_wrappers-devel -y
yum install expat expat-devel -y
yum install libxml2 libxml2-devel libxslt libxslt-devel -y
yes | cpan YAML
yes | cpan App::cpanminus
/usr/local/bin/cpanm Catmandu Catmandu::MARC

2.0.4 CentOS 7

yum group install "Development Tools"
yum install perl-devel perl-YAML perl-CPAN perl-App-cpanminus -y
yum install openssl-devel tcp_wrappers-devel expat expat-devel libxml2 libxml2-devel libxslt libxslt-devel -y
cpanm Test::More Catmandu Catmandu::MARC

2.0.5 openSUSE

sudo zypper install --type pattern devel_basis
sudo zypper install libxml2-devel libxslt-devel
curl -L | perl - App::cpanminus  ## unless you already have cpanm
cpanm Catmandu Catmandu::MARC

2.0.6 OpenBSD 53

cpan App::cpanminus
cpanm Catmandu Catmandu::MARC

2.0.7 OSX

brew install libxml++ libxml2 xml2 libxslt
# Install plenv from
git clone ~/.plenv
echo 'export PATH="$HOME/.plenv/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(plenv init -)"' >> ~/.bash_profile
exec $SHELL -l
git clone ~/.plenv/plugins/perl-build/
# Install a modern Perl
plenv install 5.22.0
plenv rehash
cpan App::cpanminus
# Install catmandu
cpanm Catmandu Catmandu::MARC
plenv rehash

2.0.8 Windows, Mac OSX, Linux

A docker image of Catmandu is build with each release. After installation of docker get and use the Catmandu image like this:

# Upgrade to the latest version
docker pull librecat/catmandu

# Run the docker command
docker run -it librecat/catmandu

Or, in case you want a native install use ActivePerl

via ActivePerl

ppm install Catmandu

2.0.9 Raspbian GNU/Linux 7 on the Raspberry Pi (armhf)

Since Raspbian is based on Debian stable, you could follow the instructions there. Unfortunately, you will run into timeouts, so it is advisable to install some prerequisites via apt-get first:

sudo apt-get install libboolean-perl libdevel-repl-perl libnet-twitter-perl 
sudo apt-get install libxml-easy-perl libxslt1-dev libgdbm-dev

3 Command line client

Most of the Catmandu processing doesn’t require you to write any code. With our command line tools you can store data files into databases, index your data, export data in various formats and provide basic data cleanup operations.


The convert command is used to transfrom one format to another, or to download data from the Internet. For example, to extract all titles from a MARC record one can write

$ catmandu convert MARC to CSV --fix 'marc_map(245a,title); retain(title)' < data.mrc

In the example above, we import MARC and export it again as CSV while extracting the 245a field from a record and deleting all the rest. With the convert command one can transform data from one format to another.

Transform JSON to YAML:

$ catmandu convert JSON to YAML < data.json

Transform YAML to JSON:

$ catmandu convert YAML to JSON < data.json

Convert Excel to CSV:

$ catmandu convert XLS to CSV < data.xls

A Fix language can be used to extract the fields from a input you are interested in:

Convert Excel to CSV and only keep the titles, authors, and year columns:

$ catmandu convert XLS to CSV --fix 'retain(titles,authors,year)' < data.xls

In formats such as JSON or YAML the data can be deeply nested. All these fields can be accessed and converted.

$ catmandu convert JSON --fix 'upcase(my.nested.field.1)' < data.xls

In the example above a JSON input is converted by upcasing the field my that contains a field nested that contains a field field that contains a list for which the second item (indicated by 1) should be upcased.

The convert command can also be used to extract data from a database. For example to download the Dublin Core data from the UGent institutional repository type:

$ catmandu convert OAI --url

To get a CSV export of all identifiers in this OAI-PMH service type:

$ catmandu convert OAI --url to CSV --fix 'retain(_id)'

Or a YAML file with all titles:

$ catmandu convert OAI --url --set public to YAML --fix 'retain(title)'


The import command is used to import data into a database. Catmandu provides support for NOSQL databases such as MongoDB, Elasticsearch and CouchDB which require no preconfiguration before they can be used. There is also support for relational databases such as Oracle, MySQL and Postgres via DBI or search engines like Solr but they need to be configured first (databases, tables, schemas need to be created first).

Importing a JSON document into MongoDB database can be as simple as:

$ catmandu import JSON  to MongoDB --database_name bibliography < books.json

Importing into a database can be done for every format that is supported by Catmandu. For instance, MARC can be imported with this command:

$ catmandu import MARC to MongoDB --database_name marc_data < data.mrc


$ catmandu import XLS to MongoDB --database_name my_xls_data < data.xls

Even a download from a website can be directly stored into a database.

$ catmandu import -v OAI --url to MongoDB --database_name oai_data

In the example above a copy of the institutional repository of Ghent University was loaded into a MongoDB database. Use the option -v to see a progress report.

Before the data is imported a Fix can be applied to extract fields or transform fields before they are stored into the database. For instance, we can extract the publication year from a MARC import and store this as a separate year field:

$ catmandu import MARC to MongoDB --database_name marc_data --fix 'marc_map("008/7-10",year)' < data.mrc


The export command is used to retreive data from a database. See the import command above for a list of databases that are supported.

For instance we can export all the MARC records we have imported with this command:

$ catmandu export MongoDB --database_name marc_data 

In case we only need the title field from the marc records and want the results in a CSV format we can add some fixes:

$ catmandu export MongoDB --database_name marc_data to CSV --fix 'marc_map(245a,title); retain(title)'

Some database support a query syntax to query for records to be exported. For instance, in the example above we extracted the year field form the MARC import. This can be used to only export the records of a particular year:

$ catmandu export MongoDB --database_name marc_data --query '{"year": "1971"}'


It is often handy to store the configuration options of importers, exporter and stores into a file. This allows you to create shorter easier commands. To do this a file ‘catmandu.yml’ needs to be created in your working directory with content like:

     package: OAI
        set: public
     package: MongoDB
        database_name: oai_data

When this file is available, an OAI-PMH harvest could be done with the shortened command:

$ catmandu convert ghent

To store the ghent OAI-PMH import into the MongoDB database, one could write:

$ catmandu import ghent to ghentdb

To extract the data from the database, one can write:

$ catmandu export ghentdb


See the Command line client Cheat Sheet for more examples of command line commands.

4 Concepts

To better make use of Catmandu is helps to first understand its core concepts:

Items are the basic unit of data processing in Catmandu. Items can be read, stored, and accessed in many formats. An item can be a MARC record or a RDF triple or one row in an Excel file.

Importers are used to read items. There are importers for MARC, JSON, YAML, CSV, Excel, and many other input formats. One can also import from remote sources such as SPARQL, Atom and OAI-PMH endpoints.

Exporters are used to transform items back into JSON, YAML, CSV, Excel or any format you like.

Stores are database to store your data. With database such MongoDB and ElasticSearch it becomes really, really easy to store quite complicated, deeply nested, items.

Fixes transforms items, transform the data into any format you like. See Fix language and Fix packages for details.

4.1 Items

An item is the basic unit of data processing in Catmandu. Items are data structures build of key-value-pairs (aka objects), lists (aka arrays), strings, numbers, and null-values. All items can be expressed in JSON and YAML, among other formats.

Internally all data processing by Catmandu is using a generic data format not unlike JSON. If one imports MARC, XML, Excel, OAI-PMH, SPARQL, data from a database or any other format, everything can be expressed as JSON.

For example:

  • JSON/YAML - when importing a large JSON/YAML collections as an array, every item is a Catmandu item.
  • Text - for text import every line of text is one Catmandu item.
  • MARC - when importing MARC data, every record in a MARC file is one Catmandu item.
  • XLS,CSV - for tabular formats such as Excel, CSV and TSV, each row in a table is one Catmandu item
  • RDF - for linked data formats such as RDF/XML, RDF/nTriples, RDF/Turtle each triple is one Catmandu item
  • SPARQL - for a result set of a SPARQL or LDF query, every result (with the variable bindings) is one Catmandu item
  • MongoDB,ElasticSearch,Solr,DBI - for databases every record in the database is one Catmandu item

To transform items with the Fix language one points to the fields in items with a JSONPath expression (Catmandu uses an extension of JSONPath actually). The fixes provided to a catmandu command operate on all individual items.

For instance, the command below will upcase the publisher field for every item (row) in the data.xls file:

$ catmandu convert XLS --fix 'upcase(publisher)' < data.xls

This command will select only the JSON items that contain ‘Tsjechov’ in a nested authors field:

$ catmandu convert XLS --fix 'select any_match(authors.*,"Tsjechov.*")' < data.json

This command will delete all the uppercase A characters from a Text file:

$ catmandu convert Text to Text --fix 'replace_all(A,"")' < data.txt

To see the internal representation of a MARC file in Catmandu, transform it for instance to YAML

$ catmandu convert MARC to YAML < data.mrc

One will see that a MARC record is treated as an array of arrays for each item.

4.2 Importers

Importers are Catmandu packages to read a specific data format. Catmandu provides importers for MARC, JSON, YAML, CSV, Excel, and many other input formats. One can also import from remote sources for instance via protocols such as SPARQL and OAI-PMH.

The name of a Catmandu importer should be provided as first argument to the convert command.

Read JSON input:

$ catmandu convert JSON

Read YAML input

$ catmandu convert YAML

Read MARC input

$ catmandu convert MARC

Importer can accept options to be configured. For instance, the MARC importer can read in USMARC, MicroLIF, MARCMake, MARC-in-JSON, MARCXML and Aleph Sequential if the correct type argument is provided

Read MARC-XML input

$ catmandu convert MARC --type XML < marc.xml

Read Aleph sequential input

$ catmandu convert MARC --type ALEPHEQ < marc.txt

Read more about the configuration options of importers by reading their manual pages:

$ catmandu help import JSON
$ catmandu help import YAML

4.3 Exporters

Exporters are Catmandu packages to export data in specific format. See Importers for the opposite action.

Some exporter such as JSON and YAML can handle any type of input. It doesn’t matter how the input is structured, it is always possible to create a JSON or YAML file.

Exporter are given after the to argument to the convert command

$ catmandu convert OAI --url to JSON
$ catmandu convert MARC to JSON
$ catmandu convert XLS to JSON

For most exporters however, the input data needs to be structured in a specific format. For instance, tabular formats such as Excel, CSV and TSV don’t allow for nested fields. In the example below, catmandu tried to convert a list into a simple value which will fail:

$ echo '{"colors":["red","green","blue"]}' | catmandu convert JSON to CSV

The is an ARRAY output, indicating that the colors field is nested. To fix this, a transformation needs to be provided:

$ echo '{"colors":["red","green","blue"]}' | catmandu convert JSON to CSV --fix 'join_field(colors,",")'

MARC output should have an input in the Catmandu MARC format, RDF exports need the aREF format, etc etc.

Exporter also accept options to configure the various kinds of exports. For instance, JSON can be exporter in a array or line by line format

$ catmandu convert MARC to JSON --array 1 < data.mrc
$ catmandu convert MARC to JSON --line_delimited 1 < data.mrc
$ catmandu convert MARC to JSON --pretty 1 < data.mrc

The Catmandu::Template package can be used to generate any type of structured output given an input using the Template Toolkit language.

For instance, to create a JSON array of colors an echo command can used on Linux:

$ echo '{"colors":["red","green","blue"]}'

To transform this JSON into XML, the Template exporter can be used with a template file as a command line argument:

$ echo '{"colors":["red","green","blue"]}' | catmandu convert JSON to Template --template `pwd`/

and like:

[% FOREACH c IN colors %]
  <color>[% c %]</color>
[% END %]

will produce:


Consult the manual pages of catmandu to see the output options of the different Exporters:

$ catmandu help export JSON
$ catmandu help export YAML
$ catmandu help export CSV

4.4 Stores

Store are Catmandu packages to store Catmandu Items in a database. These databases need to be installed separately from Catmandu. Special database such as MongoDB, ElasticSearch and CouchDB can work out-of-the-box with hardly any configuration. For other databases such as Solr, MySQL, Postgres and Oracle extra configuration steps are needed to define the database schemas.

Catmandu stores such as MongoDB, ElasticSearch and CouchDB can accept any type of input. They are perfect tools to store the output of data conversions.

Without defining any database schema, JSON, YAML , MARC, Excel, CSV, OAI-PMH or any other Catmandu supported format can be stored.

$ catmandu import JSON to MongoDB --database_name test < data.json
$ catmandu import YAML to MongoDB --database_name test < data.yml
$ catmandu import MARC to MongoDB --database_name test < data.mrc
$ catmandu import XLS to MongoDB --database_name test  < data.xls

Many Catmandu stores can be queried with their native query language:

$ catmandu export MongoDB --database_name test --query '{"my.deep.field":"abc"}'

To delete data from a store the delete command can be used.

# Delete everything
$ catmandu delete MongoDB --database_name test  
# Delete record with _id = 1234 and _id = 1235
$ catmandu delete MongoDB --database_name test --id 1234 --id 1235

Use the count command to show the size of a database.

$ catmandu count MongoDB --database_name test  

One important use-case for Catmandu is indexation of data in search engines such as Solr. To do this, Solr needs to be configured for the fields you want to make searchable. Your data collection can be indexed in the Solr engine by mapping the fields in your data to the fields available in Solr.

$ catmandu import MARC to Solr --fix marc2solr.fix < data.mrc

where marc2solr.fix is a Fix script containing all the fixes required to transform your input data in the Solr format:

# marc2solr.fix

In reality the Fix script will contain many mappings and data transformations to clean data. See Example Fix Script for a long example of such a data cleaning in action.

4.5 Fixes

Fixes are used for easy data transformations by non programmers. Using a small Fix language non-programmers can manipulate Catmandu Items.

To introduce the capabilities of Fix, an example will be provided below to extract data from a MARC input.

First, make sure that Catmandu::MARC is installed on your system.

 $ sudo cpanm Catmandu::MARC

We will use the Catmandu command line client to extract data from an example USMARC file that can be downloaded via this: link - camel.usmarc.

With the convert command one can read items from a MARC Importer and convert it into a new format. By default, convert will output JSON:

$ catmandu convert MARC < camel.usmarc
{"record":[["LDR",null,null,"_","00755cam  22002414a 4500"],["001",null,null...
["650"," ","0","a","Cross-platform software development."]],"_id":"fol05882032 "}

You can make this conversion explicit:

$ catmandu convert MARC to JSON < camel.usmarc

To transform this MARC data we first will create a Fix file which contains all the Fix commands we will use. Create a text file ‘fixes.txt’ on your system with this input:


and execute the following command:

$ catmandu convert MARC --fix fixes.txt < camel.usmarc
{"_id":"fol05731351 "}
{"_id":"fol05754809 "}
{"_id":"fol05843555 "}
{"_id":"fol05843579 "}

We have removed the field ‘record’ (containing the MARC data) from the JSON record. This is what the ‘remove_field’ Fix does: remove one field in a JSON record. We will use this remove_field(‘record’) to make our output a bit more terse and easier readable.

With the ‘marc_map’ Fix from the Catmandu::MARC package we can extract MARC (sub)fields from the record. Add these to the fixes.txt file:


When we run our previous catmandu command we get the following output:

$ catmandu convert MARC --fix fixes.txt to JSON --line_delimited 1 < camel.usmarc
{"_id":"fol05731351 ","title":"ActivePerl with ASP and ADO /Tobias Martinsson."}
{"_id":"fol05754809 ","title":"Programming the Perl DBI /Alligator Descartes and Tim Bunce."}
{"_id":"fol05843555 ","title":"Perl :programmer's reference /Martin C. Brown."}

We know that in the 650-a field of MARC we can find subjects. Lets add them to the fixes.txt:


and run the command again:

$ catmandu convert MARC --fix fixes.txt to JSON --line_delimited 1 < camel.usmarc
{"subject":"Perl (Computer program language)","_id":"fol05731351 ","title":"ActivePerl with ASP and ADO /Tobias Martinsson."}
{"subject":"Perl (Computer program language)Database management.","_id":"fol05754809 ","title":"Programming the Perl DBI /Alligator Descartes and Tim Bunce."}
{"subject":"Perl (Computer program language)","_id":"fol05843555 ","title":"Perl :programmer's reference /Martin C. Brown."}

The MARC 008 field from position 7 to 10 contains publication years. We can also add these to the ‘fixes.txt’ file:


and run the command:

$ catmandu convert MARC --fix fixes.txt to JSON --line_delimited 1 < camel.usmarc
{"subject":"Perl (Computer program language)","_id":"fol05731351 ","title":"ActivePerl with ASP and ADO /Tobias Martinsson.","year":"2000"}
{"subject":"Perl (Computer program language)Database management.","_id":"fol05754809 ","title":"Programming the Perl DBI /Alligator Descartes and Tim Bunce.","year":"2000"}
{"subject":"Perl (Computer program language)","_id":"fol05843555 ","title":"Perl :programmer's reference /Martin C. Brown.","year":"1999"}

You don’t need to write fixes into a file to use them. E.g. if we want to have some statistic on the publication year in the camel.usmarc file we can do something like:

$ catmandu convert MARC --fix "marc_map('008/7-10','year'); retain_field('year')" to CSV < camel.usmarc

With marc_map we extracted the year form the 008 field. With retain_field we deleted everything in the output except for the field ‘year’. We used the CSV Exporter to present the results in an easy format.

5 Fix language

Catmandu comes with a small domain specific language for manipulation of data items called Fix. The Fix consists of

5.1 Paths


Almost any transformation on a Catmandu item contains a path to the part of the item that needs to be changed. To upcase the title field in an item the Fix upcase need to be used:


A field can be nested in key-value-pairs (objects). To access the field deep in a key-value-pair, the dot-notation should be used:


If a part of an item contains a list of fields than the index-notation should be used. Use index 0 to point to the first item in a list, index 1 to point to the second item in a list, index 3 to the third, etc, etc.

upcase(  # upcase the title of the 3rd item in the list

For example, given this YAML input:

title: My Little Pony
   - red
   - green
   - blue
       c: Hoi!

The value ‘My Little Pony’ can be accessed using the path:


The value ‘green’ can be accessed using the path:


The value ‘Hoi!’ can be accessed using the path:


5.1.1 Wildcards

Wildcards are used to point to relative positions or many positions in a list.

To point to the first item in a list (e.g. the value ‘red’ in the example above) the wildcard $first can be used:


To point to the last item in a list (e.g. the value ‘blue’ in the example above) the wildcard $last can be used:


In some cases, one needs to point to a position before the first item in a list. For instance, add a new field before the color ‘red’ in our example above, the wildcard ‘$prepend’ should be used:


This wildcard can be used in the functions like set_field:


To add a new field add the end of a list (after the color ‘blue’), the wildcard ‘$append’ should be used:


As in:


The star notation is used to point to all the items in a list:


To upcase all the colors use:


When list are nested inside lists, then wildcards can also be nested:


The above above can be used when the my field contains a list which contains a color field which contains again a list of data. E.g.

 - colors:
     - red
     - blue
 - colors:
     - yellow
     - green

5.1.2 MARC, MAB, PICA paths

For some data formats is can be quite difficult to extract data by the exact position of a field. In data formats such as MARC, one is unsually not interested in a field in the 17th position which contains a subfield in position 3. MARC contains tags and subfields, which can be at any position in the MARC record.

Specialized Fix functions for MARC, MAB and PICA make it easier to access data by changing the Path syntax. For instance, to copy the 245a field in a MARC record to the title field one can write:


In the context of a marc_map Fix the “245a” Path is a MARC Path that points to a part of the MARC record. These MARC Paths only work in MARC Fixes (marc_map, marc_add, marc_set, marc_remove). It is not possible to use these paths in other Catmandu fix functions:

marc_map("245a",title)            # This will work
copy_field("246a","other_title")  # This will NOT work

Consult the documentation of the different specialised packages for the Path syntax that can be used.

5.2 Functions

Fix functions manipulate fields in every item of a Catmandu Importer. For instance, using the command below the title field will be upcased for every item in the input list of JSON items.

$ catmandu convert JSON --fix 'upcase(title)' < data.json

Fix functions can have zero or more arguments separated by commas:

vacuum()              # Clean all empty fields in a record
upcase(title)         # Upcase the title value
append(title,"-123")  # Add -123 at the end of the title value 

The arguments to a Fix function can be a Fix path or a literal string. Literal string can be quoted with double or single quotes.

append(title,'foo bar')

In case of single quotes all the characters between quotes will be interpreted verbatim. When using double quotes, the values in quotes can be interpreted by some Fix functions.

replace_all(title,"My (.*) Pony","Our $1 Fish")   # Replace 'My Little Pony' by 'Our Little Fish'

Some Fix functions accept zero or more options which need to be specified as a name : value:

sort_field(tags, reverse:1)               # Sort the tags field in reverse order
lookup("title","dict.csv", sep_char:'|',default:'NONE')  # Lookup a title in a CSV file

Unless specified otherwise (such as in Binds), Fix function are executed in the order given by the Fix script:


In the example above all transformations on the field authors will be executed in the order given. For example when the field authors contains this list:

  - John
  - Mary
  - Dave

The first fix will transform this list into:

  - JOHN
  - MARY
  - DAVE

The second fix will append “abc” to all authors

  - JOHNabc
  - MARYabc
  - DAVEabc

The third fix will replace all “a”-s by “AB”s

  - JOHNABbc
  - MARYABbc
  - DAVEABbc

In some cases the ordering of transformations of items in a list matters. For instance, you want to first do a sequence of transformation on all first items in a list, then a sequence of transformations on all second items in a list, etc. To change this ordering of Fix functions Binds need to be used.

For a nearly complete list of functions currently available in Catmandu, take a look at the Fixes Cheat Sheet.

5.3 Selectors

With Fix selectors one can select which Catmandu items can end up in an output stream or not. Using a selector to throw away the records you are not interested in. For instance, to filter out all the records in a input use the reject() selector:

$ catmandu MARC to YAML --fix "reject()" < data.mrc

The command above will generate no output: every record is rejected. The opposite of reject() is the select() selector which can be used to select all the Catmandu items you want to keep in an output:

$ catmandu MARC to YAML --fix "select()" < data.mrc

The command above will return all the MARC items in the input file.

Selectors are of little use when used in isolation. Most of the time they are combined with Conditionals. To select only the MARC records that have “Tsjechov” in the 100a field one can write:

$ catmandu MARC to YAML --fix "select marc_match(100a,'.*Tsjechov.*') " < data.mrc

There are two alternative ways to combine selector with a conditional. In the guard syntax, the conditional is written after after the selector:

reject exits(error.field)
reject all_match(publisher,'xyz')
select any_match(years,2005)

In the if/then/else syntax the conditional is written explicitly:

if exists(error.field)

if all_match(publisher,'xyz')

if any_match(years,2005)

5.4 Conditionals

A Conditional is executed depending on a boolean condition that can be true or false. For instance, to skip a Catmandu item when the field error exists one would write the conditional exists:

if exists(error)

A condition contains an if or unless statement a Conditional (Fix functions which can be true or false), a body of zero or more Fix functions and an optional else clause:

if exists(error)
   # Write here all the Fix functions when the field 'error' exists
unless exists(error)
  # Write here all the Fix functions when the field 'error' doesn't exist
if exists(error)
   # If error exists then do this
   # otherwise do this

Below follows some basic fix functions that are implemented in Catmandu. Check the manual pages of the individual Catmandu extensions for more elaborate Conditionals.


True, when the path exists and is exactly equal to a value. When the path points to a list, then all the list members need to be equal to the value. False otherwise.

if all_equal(year,"2018")

if all_equal(animals.*,"cat")


True, when the path exists and is exactly equal to a value. When the path points to a list, then at least one of the list members need to be equal to the value. False otherwise.

if any_equal(year,"2018")

if any_equal(animals.*,"cat")
  set_field(animal_types,"some feline")


True, when the path exists and the value matched the regex regular expression. When the path points to a list, then all the values have to match the regular expression. False otherwise.

if all_match(year,"^19.*$")
  set_field(period,"20th century")

if all_match(publishers.*,"Elsevier.*")


True, when the path exists and the value matched the regex regular expression. When the path points to a list, then at least one of the values has to match the regular expression. False otherwise.

if any_match(year,"^19.*$")
  set_field(period,"20th century")

if any_match(publishers.*,"Elsevier.*")


True, when the path exists in the Catmandu item. False otherwise.

if exists(my.deep.field)

if exists(my.list.0)


True, when the path exists and the value is greater than a number. When the path points to a list, then all the members need to be greater than the number. False otherwise.


True, when the path exists and the value is less than a number. When the path points to a list, then all the members need to be less than the number. False otherwise.


True, when the values of the first path1 are contained in the values at the second path2. False otherwise.

For instance to check if two paths contain the same values type:

if in(my.title,your.title)

To check if a value in one path is contained in a list of an other path type:

if in(,your.authors.*)


True, if the value at path can be evaluated to a boolean true. False otherwise


True, if the value at path can be evaluated to a boolean false. False otherwise

5.5 Binds

Binds change the execution context of a Fix script. In normal operation, all Fix functions are executed from the first to the last. For example given the YAML input:

  - red
  - green
  - blue

every Fix functions will be executed one by one on all the colors:

append(colors.*," is a nice color")

The first Fix upcase will uppercase all the colors, the second append will add " is a nice color" to all the colors, the last copy_field will copy all the colors to a new field.

But what should you do when you want the three Fix functions to operate on each color separately? First upcase on the first color, append on the first color, copy_field on the first color, then again upcase on the second color, append on the second color, etc.

For this type of operation a Bind is needed using the do notation:

do list(path:colors.*, var:c)
  append(c," is a nice color")

In the example above the list Bind was introduced. The context of the execution of the Bind body is changed. Instead of operating on one Catmandu item as a whole, the Fix functions are executed for each element in the list.

Each Bind changes the execution context in some way. For instance Fix functions could execute queries into database, or fetch data from the internet. These operations can fail when the database is down, or the website couldn’t be reached. What should happen in that case in a Fix script? Should the execution be stopped? Or, should there errors be ignored.

download_from_internet() # <--- this one failes

What should happen in the example above? Should the results be processed when the download_from_internet fails? Using the maybe Bind one can skip Fix functions that fail:

do maybe()
  process_results() # <--- this is skipped when download_from_internet fails

Binds are also used when creating Fix executables. That are Fix scripts that can be run directly from the command line. In the example below we’ll write a Fix script that downloads data from an OAI-PMH repository and prints all the record identifiers:

#!/usr/bin/env catmandu run
do importer(OAI,url: "") 

If this script is stored on a file system as myscript.fix and made executable:

$ chmod 755 myscript.fix

then you can run this script as any other Unix command:

$ ./myscript.fix


Comments can be added to the Fix scripts to enhance the readability of your transformations. All lines that start with a hash sign (#) are ignored by Catmandu:

# This is a comment
  # This is also a comment
add_field(foo,bar)  #This is a comment at the and of a line, add_field will be executed
# remove_field(foo) this line is a comment, remove_field(foo) will not be executed by the script

6 Cheat sheets

6.1 Command line client Cheat Sheet

This cheat sheet summarizes the command line client capabilities.

$ catmandu help
$ catmandu help convert

6.1.1 Convert

Convert one data format to another optionally provide a Fix script to transform the data

$ catmandu convert MARC to JSON < records.mrc
$ catmandu convert MARC to YAML < records.mrc
$ catmandu convert MARC to JSON --pretty 1 < records.mrc
$ catmandu convert MARC to JSON --fix 'marc_map("245","title");remove_field("record")' < records.mrc
$ catmandu convert MARC to CSV --fix myfixes.fix < records.mrc
$ cat myfixes.fix
$ catmandu convert MARC to CSV --fix myfixes2.fix --var source="Springer" < records.mrc
$ cat myfixes2.fix
$ catmandu convert OAI --url --set allFtxt to JSON
$ catmandu convert OAI --url --set allFtxt to JSON --fix 'retain_field("title")'
$ catmandu convert SRU --base --query dna  
$ catmandu convert ArXiv --query 'all:electron'
$ catmandu convert PubMed --term 'hochstenbach'
$ cat
[%- FOREACH f IN record %]
[% _id %] [% f.shift %][% f.shift %][% f.shift %][% f.join(":") %]
[%- END %]
$ catmandu convert MARC to Template --template `pwd`/ < records.mrc 

6.1.2 Import/Export

Store data in a (noSQL) database and export it out again

$ catmandu import JSON to MongoDB --database_name mydb --bag data < records.json
$ catmandu import MARC to MongoDB --database_name mydb --bag data < records.mrc
$ catmandu import MARC to ElasticSearch --index_name mydb --bag data < records.mrc
$ catmandu import MARC to ElasticSearch --index_name mydb --bag data --fix 'marc_map("245a","title")' < records.mrc

$ catmandu export MongoDB --database_name mydb --bag data to JSON
$ catmandu export MongoDB --database_name mydb --bag data to JSON --fix 'retain_field("_id")'
$ catmandu export Solr --url http://localhost:8983/solr to JSON
$ catmandu export ElasticSearch --index_name mydb to JSON

6.1.3 Copy

Copy data from one database to another

$ catmandu copy MongoDB --database_name items --bag book to ElasticSearch --index_name items --bag book

6.1.4 Count

Count the number of items in a store

$ catmandu count ElasticSearch --index-name shop --bag products --query 'brand:Acme'

6.1.5 Delete

Delete data from a store

# delete items with matching _id
$ catmandu delete ElasticSearch --index-name items --bag book --id 1234 --id 2345

# delete items matching the query
$ catmandu delete ElasticSearch --index-name items --bag book --query 'title:"My Rabbit"'

# delete all items
$ catmandu delete ElasticSearch --index-name items --bag book

6.1.6 Configuration

$ cat catmandu.yml
   package: MongoDB
    database_name: mydb
   package: ElasticSearch
    index_name: mydb
   package: Solr
    url: http://localhost:8983/solr

$ catmandu import JSON to test1 < records.json # Mongo
$ catmandu import MARC to test2 < records.mrc  # ElasticSearch
$ catmandu import YAML to test3 < records.yaml # Solr
$ catmandu export test1 to JSON                # Mongo
$ catmandu export test2 to JSON                # ElasticSearch
$ catmandy export test3                        # Solr
$ cat fixes.txt
$ catmandu import MARC to test2 --fix fixes.txt

6.2 Fixes Cheat Sheet

This cheat sheet summarizes the fix language.

# Fixes clean your data. As input you get a Perl HASH. Each fix function is a command
# to transform the Perl HASH. Some fixes such as marc_map contain logic to transform
# complex data structures such as MARC.
set_field("","patrick")             # { my => { name => 'Patrick'} }

set_array("foo")                           # Create an empty array foo => []
set_array("foo","a","b","c")               # Create an array with three values foo => ['a','b','c']
set_hash("foo")                            # Create an empty hash foo => {}
set_hash("foo",a: b,c: d)                  # Create an hash with two values foo => { a => 'b' , c => 'd' }

array("foo")                               # Create an array from a hash :
                                           # foo => {"name":"value"} => [ "name" , "value" ]
hash("foo")                                # Create a hash from an array
                                           # foo => [ "name" , "value" ] => {"name":"value"}

assoc(fields, pairs.*.key, pairs.*.val)    # Associate two values as a hash key and value
                                           # {pairs => [{key => 'year', val => 2009}, {key => 'subject', val => 'Perl'}]}
                                           # {fields => {subject => 'Perl', year => 2009}, pairs => [...]}

upcase("title")                            # marc -> MARC
downcase("title")                          # MARC -> marc
capitalize("my.deeply.nested.field.0")     # marc -> Marc
trim("field_with_spaces")                  # "  marc  " -> marc
substring("title",0,1)                     # marc -> m
prepend("title","die ")                    # marc -> die marc
append("title"," must die")                # marc -> marc must die

# {author => "tom jones"}  -> {author => "senoj mot"}
# {numbers => [1,14,2]} -> {numbers => [2,14,1]}

# replace the value with a formatted (sprintf-like) version
# e.g. numbers: 
#         - 41
#         - 15
format(number,"%-10.10d %-5.5d") # numbers => "0000000041 00015"
# e.g. hash:
#        name: Albert
format(name,"%-10s: %s") # hash: "name      : Albert"

# date: "2015-03-07"
parse_text(date, '(\d\d\d\d)-(\d\d)-(\d\d)')
# date: 
#    - 2015
#    - 03
#    - 07

#  parses a text into an array or hash of values
# date: "2015-03-07"
parse_text(date, '(\d\d\d\d)-(\d\d)-(\d\d)')
# date: 
#    - 2015
#    - 03
#    - 07 
# If you data record is:
#   a: eeny
#   b: meeny
#   c: miny
#   d: moe
paste(my.string,a,b,c,d)                 # my.string: eeny meeny miny moe
# Use a join character
paste(my.string,a,b,c,d,join_char:", ")  # my.string: eeny, meeny, miny, moe
# Paste literal strings with a tilde sign
paste(my.string,~Hi,a,~how are you?)     # my.string: Hi eeny how are you?

# date: "2015-03-07"
parse_text(date, '(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d)')
# date:
#   year: "2015"
#   month: "03" 
#   day: "07"
# date: "abcd"
parse_text(date, '(\d\d\d\d)-(\d\d)-(\d\d)')
# date: "abcd"

lookup("title","dict.csv", sep_char:'|')  # lookup 'marc' in dict.csv and replace the value
lookup("title","dict.csv", default:test)  # lookup 'marc' in dict.csv and replace the value or set it to 'test'
lookup("title","dict.csv", delete:1)    # lookup 'marc' in dict.csv and replace the value or delete nothing found

lookup_in_store('title', 'MongoDB', database_name:lookups)  # lookup the (id)-value of title in 'lookups' and
                                           # replace it with the data found
lookup_in_store('title', 'MongoDB', default:'default value' , delete:1) 

# Replace the data in with an external file or url
import(, JSON, file: "", data_path: data.*)

add_to_store('authors.*', 'MongoDB', bag:authors, database_name:catalog)  # add matching values to a store as a side effect

add_to_exporter(data,CSV,header:1,file:/tmp/data.csv) # send the 'data' path to an alternative exporter
add_to_exporter(.,CSV,header:1,file:/tmp/data.csv)    # send the complete record to an alternative exporter

count("myarray")                           # count number of elements in an array or hash
sum("numbers")                             # replace an array element with the sum of its values
sort_field("tags")                         # sort the values of an array
sort_field("tags", uniq:1)                 # sort the values plus keep unique values
sort_field("tags", reverse:1)              # revese sort
sort_field("tags", numeric:1)              # sort numerical values
uniq(tags)                                 # strip duplicate values from an array
filter("tags","[Cc]at")                    # filter array values tags = ["Cats","Dogs"] => ["Cats"]
flatten(deep)                              # {deep => [1, [2, 3], 4, [5, [6, 7]]]} => {deep => [1, 2, 3, 4, 5, 6, 7]}

cmd("java MyClass")                        # Use an external program that can read JSON 
                                           # from stdin and write JSON to stdout
perlcode("")                    # Execute Perl code as fix function
sleep(1,SECOND)                            # Do nothing for one second

split_field("foo",":")                     # marc:must:die -> ['marc','must','die']
join_field("foo",":")                      # ['marc','must','die'] -> marc:must:die
retain("id","id2","id3")                   # delete any field except 'id', 'id2', 'id3'
replace_all("title","a","x")               # marc -> mxrc

# Most functions can work also work on arrays. E.g.
replace_all("author.*","a","x")            # [ 'marc','jan'] => ['mxrc','jxn']
# Use:
#   authors.$end (last entry)
#   authors.$start (first entry)
#   authors.$append (last + 1)
#   authors.$prepend (first - 1)
#   authors.* (all authors)
#   authors.2 (3rd author)

collapse()                                 # collapse deep nested hash to a flat hash
expand()                                   # expand flat hash to deep nested hash
clone()                                    # clone the perl hash and work on the clone
reject()                                   # Reject (skip) a record
reject [condition]                         # Reject a record on some condition:
                                           #   reject all_match(...)
                                           #   reject any_match(...)
                                           #   reject exists(...)
select()                                   # Select a record
select [condition]                         # Select only those records that match a condition (see reject)

to_json('my.field')                        # convert a value of a field to json
from_json('my.field')                      # replace the json field with the parsed value

nothing()                                  # do nothing (used in benchmarking)

# '3%A9' => 'café'
# 'café' => '3%A9'

# Add a new field 'foo' with a random value between 0 and 9
random(foo, 10)

# Delete all the empty fields

# Copy all 245 subfields into the my.title hash
# Copy the 245-$a$b$c subfields into the my.title hash in the order of the record
# Copy the 245-$c$b$a subfields into the my.title hash in the order of the mapping
marc_map('245cba','my.title' , pluck:1) 
# Copy the 100 subfields into the my.authors array
# Add the 710 subfields into the my.authors array
# Copy the 600-$x subfields into the my.subjects array while packing each into a genre.text hash
# Copy the 008 characters 35-35 into the my.language hash
# Copy all the 600 fields into a my.stringy hash joining them by '; '
marc_map('600','my.stringy', join:'; ')
# When 024 field exists create the my.has024 hash with value 'found'
marc_map('024','my.has024', value:found)
# Do the same examples now with the marc fields in 'record2'
marc_map('245','my.title', record:record2)
# Remove the 900 fields
# Add a marc field (in Catmandu::MARC 0.110)
marc_add('999', ind1, ' ' , ind2, '1' , a, 'test123')
# Add a marc field populated with data from your record
marc_add('245', a , $.my.title.field, c , $
# Set a marc value of one (sub)field to a new value
marc_set('100[3]a','Farquhar family.')

# Map all 650 subjects into an array 
marc_map('650','subject', join:'###') 

# Include fixes from another file

# Send debug messages to a logger
log('hello world' , level => 'DEBUG')

# uppercase the value of field 'foo' if all members of 'oogly' have the value 'doogly'
if all_match('oogly.*', 'doogly')
  upcase('foo') # foo => 'BAR'
  downcase('foo') # foo => 'bar'

# inverted
unless all_match('oogly.*', 'doogly')
  upcase('foo') # foo => 'BAR'

# uppercase the value of field 'foo' if field 'oogly' has the value 'doogly'
if any_match('oogly', 'doogly')
  upcase('foo') # foo => 'BAR'

# inverted
unless any_match('oogly', 'doogly')
  upcase('foo') # foo => 'BAR'

# uppercase the value of field 'foo' if the field 'oogly' exists
if exists('oogly')
  upcase('foo') # foo => 'BAR'

# inverted
unless exists('oogly')
  upcase('foo') # foo => 'bar'

# add a new field when the 'year' field is equal to 2018
if all_equal('year','2018')

# add a new field when at least one of the 'year'-s is equal to 2018
if any_equal('years.*','2018')

# compare things (needs Catmandu 0.92 or better)
if greater_than('year',2000)

if less_than('year',1970)

# execute fixes if one path is contained in another
# foo => 1 , bar => [3,2,1]  => in(foo,bar) -> true
if in(foo,bar)

# only execute fixes if all path values are the boolean true, 1 or "true"
if is_true(data.*.has_error)

# only execute fixes if all path values are the boolean true, 0 or "false"
if is_false(data.*.has_error)

# only execute the fixes if the path contains an array
if is_array(data)

# only execute the fixes if the path contains an object (an hash, nested field)
if is_object(data)

# only execute the fixes if the path contains a number
if is_number(data)
  append(data," : is a number")

# only execute the fixes if the path contains a string
if is_string(data)
  append(data," : is a string")

# only execute the fixes if the path contains 'null' values
if is_null(data)
  set_field(data,"I'm empty!")

# Evaludates true when a marc (sub)field matches a regular expression
if marc_match('245','My funny title')
if marc_match('LDR/6','c')

## Binds (needs Catmandu 0.92 or better)

# The identity binder doesn't embody any computational strategy. It simply 
# applies the bound fix functions sequentially to its input without any 
# modification.
do identity()

# Maybe, computes all the fix functions and ignores fixes once they throw errors
# or return undef.
do maybe()
  return_undef() # rest will be ignored

# List over all items in demo and add a foo => bar field
# { demo => [{},{},{}] } => { demo => [{foo=>bar},{foo=>bar},{foo=>bar}]}
do list(path: demo)

# Print statistical information on the processing speed of fixes to the standaard error.
do benchmark(output:/dev/stderr)

# Find all ISBN in a stream
do hashmap(exporter: JSON, join:',')
  # Need an identity binder to group all operations that calculate key_value pairs
  do identity()

# Count the number of ISBN occurrences in a stream
do hashmap(count: 1)

# Filter out an array (needs Catmandu 0.9302 or better)
#    data:
#       - name: patrick
#       - name: nicolas
# to:
#    data:
#       - name: patrick
do with(path:data)
  reject all_match(name,nicolas)
  # Or:
  # if all_match(name,nicolas)
  #  reject()
  # end

#  run fixes that should run within a time limit
do timeout(time => 5, units => seconds)

# a binder that computes Fix-es for every element in record
do visitor()
   # upcase all the 'name' fields in the record
   if all_match(key,name)

# a binder runs fixes on records from an importer
do importer(OAI,url: "") 

6.3 Example Fix Script

Here is an example Fix script taken from a production system at Ghent University Library that can be used for inspiration. This script is used to feed data from a MongoDB store of MARC records to a Black Light Solr installation.

#- LLUDSS - Data cleaning fixes. Using MARC records as input
#- 2013



if exists('merge.related_desc')

if exists('merge.deleted')
    #- Document Type
    unless exists('type')
        lookup("type","/opt/lludss-import/etc/material_types.csv", default:"other")

    #- ISBN/ISSN
    marc_map('020a','isbn.$append', join:'==')
    marc_map('022a','issn.$append', join:'==')

    #- Title
    marc_map('245ab','title', join:' ')
    marc_map('246','json.title_remainder', join:' ')

    #- Author
    marc_map('100ab','author.$append', join:' ')
    marc_map('700ab','author.$append', join:' ')
    unless all_match('type','phd|master|bachelor')
        marc_map('720ab','author.$append', join:' ')

    #- Imprint
    if all_match('year','[u^?-]{4}')

    if greater_than('2018','year')

    if marc_match('008_/6-6','b')

    #- Edition

    #- Description

    #- Summary
    marc_map('505a','json.summary.$append', join:"\n")
    marc_map('520a','json.summary.$append', join:"\n")

    #- Als we een dissertation hebben dan is 502 de summary met 720 als promotor.
    #- Dit is dan ook automatisch een UGent publiaction
    if all_match('type','phd|master')

        if exists('summary');


    unless exists('json.summary')
        if exists('')

    #- Boost
    unless exists('_boost')
        if exists('')

    #- Language
    if all_match('lang','\W+')

    #- Subject
    marc_map('6**^0123456789','subject.$append', join:' ')
    sort_field('subject', uniq:1)

    #- Library, Faculty, Location
    sort_field('library', uniq:1)
    sort_field('faculty', uniq:1)
    sort_field('location', uniq:1)

     #- Host publication

    #- Holding
    if exists('p_holding')
        replace_all('year',' .*','')
    if exists('e_holding')
        replace_all('year',' .*','')


    #- Year cleanup
    unless all_match('year','^-?([0-9]|[123456789][0-9]+)$')

    #- Wikipedia

    #- Cover Image
    if all_match('merge.source','rug01|pug01|ebk01')

    #- Cover card-catalog
    if  exists(cid)

    #- Fulltext

    #- Remove record without items or fulltext
    unless exists('items')
        unless exists('json.fulltext')

    if exists('json.fulltext')
    if exists('items')

    if all_match('merge.source','pug01')

    sort_field("only", uniq:1, reverse:0)

    #- ALL Field

    #- Identifier indexes rug01, ser01, ...

    #- Set
    #- Warning: Aleph doesn't do zulu-time...
    datetime_format('updated_at', time_zone:'Europe/Brussels', set_time_zone:'UTC', source_pattern: '%Y%m%d%H%M%S.%N', destination_pattern:'%Y-%m-%dT%H:%M:%SZ', delete:1)
    if exists('updated_at')
    sort_field('set', unique:1)

    #- MARC Display
    marc_map('245','marc_display.$append.title', join:' ')
    marc_map('246','marc_display.$append.other-title', join:' ')
    marc_map('765','marc_display.$append.orig-title', join:' ')
    marc_map('210','marc_display.$append.abbrev-title', join:' ')
    marc_map('240','marc_display.$append.other-title', join:' ')
    marc_map('020','marc_display.$append.isbn', join:' ')
    marc_map('022','marc_display.$append.issn', join:' ')
    marc_map('028','marc_display.$append.publisher-no', join:' ')
    marc_map('048','marc_display.$append.voices-code', join:' ')
    marc_map('100','marc_display.$', join:' ')
    marc_map('110','marc_display.$append.corp-author', join:' ')
    marc_map('700','marc_display.$', join:' ')
    marc_map('720','marc_display.$append.other-name', join:' ')
    marc_map('111','marc_display.$append.conference', join:' ')
    marc_map('130','marc_display.$append.other-title', join:' ')
    marc_map('250','marc_display.$append.edition', join:' ')
    marc_map('255','marc_display.$append.scale', join:' ')
    marc_map('256','marc_display.$append.edition', join:' ')
    marc_map('260','marc_display.$append.publisher', join:' ')
    marc_map('261','marc_display.$append.publisher', join:' ')
    marc_map('263','marc_display.$append.publisher', join:' ')
    marc_map('300','marc_display.$append.description', join:' ')
    marc_map('310','marc_display.$append.frequency', join:' ')
    marc_map('321','marc_display.$append.prior-freq', join:' ')
    marc_map('340','marc_display.$append.description', join:' ')
    marc_map('362','marc_display.$', join:' ')
    marc_map('400','marc_display.$append.series', join:' ')
    marc_map('410','marc_display.$append.series', join:' ')
    marc_map('440','marc_display.$append.series', join:' ')
    marc_map('490','marc_display.$append.series', join:' ')
    marc_map('500','marc_display.$append.note', join:' ')
    marc_map('501','marc_display.$append.note', join:' ')
    marc_map('502','marc_display.$append.thesis', join:' ')
    marc_map('504','marc_display.$append.bibliography', join:' ')
    marc_map('505','marc_display.$append.content', join:' ')
    marc_map('508','marc_display.$append.credits', join:' ')
    marc_map('510','marc_display.$append.note', join:' ')
    marc_map('511','marc_display.$append.performers', join:' ')
    marc_map('515','marc_display.$append.note', join:' ')
    marc_map('518','marc_display.$append.note', join:' ')
    marc_map('520','marc_display.$append.summary', join:' ')
    marc_map('521','marc_display.$append.note', join:' ')
    marc_map('525','marc_display.$append.note', join:' ')
    marc_map('530','marc_display.$append.note', join:' ')
    marc_map('533','marc_display.$append.note', join:' ')
    marc_map('534','marc_display.$append.note', join:' ')
    marc_map('540','marc_display.$append.note', join:' ')
    marc_map('541','marc_display.$append.note', join:' ')
    marc_map('544','marc_display.$append.note', join:' ')
    marc_map('545','marc_display.$append.note', join:' ')
    marc_map('546','marc_display.$append.note', join:' ')
    marc_map('550','marc_display.$append.note', join:' ')
    marc_map('555','marc_display.$append.note', join:' ')
    marc_map('561','marc_display.$append.note', join:' ')
    marc_map('580','marc_display.$append.note', join:' ')
    marc_map('581','marc_display.$append.publication', join:' ')
    marc_map('583','marc_display.$append.note', join:' ')
    marc_map('586','marc_display.$append.note', join:' ')
    marc_map('591','marc_display.$append.note', join:' ')
    marc_map('598','marc_display.$append.classification', join:' ')
    marc_map('080','marc_display.$append.udc-no', join:' ')
    marc_map('082','marc_display.$append.dewey-no', join:' ')
    marc_map('084','marc_display.$append.other-call-no', join:' ')
    marc_map('600','marc_display.$append.subject', join:' ')
    marc_map('610','marc_display.$append.subject', join:' ')
    marc_map('611','marc_display.$append.subject', join:' ')
    marc_map('630','marc_display.$append.subject', join:' ')
    marc_map('650','marc_display.$append.subject', join:' ')
    marc_map('651','marc_display.$append.subject', join:' ')
    marc_map('653','marc_display.$append.subject', join:' ')
    marc_map('655','marc_display.$append.subject', join:' ')
    marc_map('662','marc_display.$append.subject', join:' ')
    marc_map('690','marc_display.$append.subject', join:' ')
    marc_map('692','marc_display.$append.subject', join:' ')
    marc_map('693','marc_display.$append.subject', join:' ')
    marc_map('710','marc_display.$append.corp-author', join:' ')
    marc_map('711','marc_display.$append.conference', join:' ')
    marc_map('730','marc_display.$append.other-title', join:' ')
    marc_map('749','marc_display.$append.title-local', join:' ')
    marc_map('752','marc_display.$append.other-info', join:' ')
    marc_map('753','marc_display.$append.other-info', join:' ')
    marc_map('772','marc_display.$append.parent-rec-ent', join:' ')
    marc_map('776','marc_display.$append.add-phys-form-e', join:' ')
    marc_map('777','marc_display.$append.issu-with-entry', join:' ')
    marc_map('780','marc_display.$append.preceding-entry', join:' ')
    marc_map('785','marc_display.$append.succeed-entry', join:' ')
    marc_map('LKR','marc_display.$append.note', join:' ')
    marc_map('024','marc_display.$append.object-id', join:' ')
    marc_map('856','marc_display.$append.e-location', join:' ')
    #-    marc_map('852jhaz','marc_display.$append.location', join:' | ')
    #-    marc_map('Z303haz','marc_display.$append.location', join:' | ')

    #- Europeana Magic

    #- MARCXML




6.4 Cookbook

Install Catmandu OAI processing on your computer

Make sure you have cpanm (hint: $ cpan App::cpanminus) installed.

$ cpanm Catmandu::OAI

Read Dublin Core records from an OAI repository from the command line

  1. Goto:
  2. Find a repository of choice
  3. Read the base URL of the repository from the ‘OAI-PMH’
  4. Execute in a terminal the catmandu import command with the URL found in the OAI-PPMH field


$ catmandu convert OAI --url

Read Dublin Core records from an OAI repository in your Perl code

use Catmandu;

Catmandu->importer('OAI',url => '')->each(sub {
   my $record = shift;
   print "$record\n";

Convert Dublin Core records from an OAI repository into YAML from the command line

$ catmandu convert OAI --url to YAML

Convert Dublin Core records from an OAI repository into YAML in your Perl code

use Catmandu -all;

my $importer = importer('OAI',url => '');
my $exporter = exporter('YAML');


Extract all identifiers from an OAI repository from the command line

$ catmandu convert OAI --url --fix 'retain_field("_id")'

or if you like an CSV file

$ catmandu convert OAI --url to CSV --fix 'retain_field("_id")'

Extract all identifiers from an OAI repository into CSV in your Perl code

use Catmandu;

my $importer = Catmandu->importer('OAI',url => '');
my $fixer    = Catmandu->fixer('retain_field("_id")');
my $exporter = Catmandu->exporter('CSV');



Show the speed of importing records from the command line

Hint: use the -v option

$ catmandu convert -v OAI --url to CSV --fix 'retain_field("_id")' > /dev/null

Here we send the output to the /dev/null to show the verbose messages.

Show the speed of importing records from your Perl program

use Catmandu;

my $importer = Catmandu->importer('OAI',url => '');
my $fixer    = Catmandu->fixer('retain_field("_id")');
my $exporter = Catmandu->exporter('CSV');



See some debug messages

Make sure you have Log::Log4perl installed (hint: $ cpan Log::Any::Adapter::Log4perl).

In your main program do:

use Catmandu;
use Log::Any::Adapter;
use Log::Log4perl;


# The lines above should be enough to activate logging for Catmandu.
# Include the lines below to activate logging for your main program.
my $logger = Catmandu->log;

$logger->info("Starting main program");

...your code...

with log4perl.conf like:

# Send a copy of all logging messages to STDERR

# Logging specific for your main program

# Logging specific for on part of Catmandu

# Where to send the STDERR output

log4perl.appender.STDERR.layout.ConversionPattern=%d [%P] - %p %l time=%r : %m%n

You will see now Catmandu log messages (e.g. for Fixes).

If you want to add logging functionality in your own Perl modules you have two options;

  1. Your package is a Catmandu::Importer or Catmandu::Exporter. In this case you are lucky because you have a logger as part of your instance:

    $self->log->debug(‘blablabla’); # where $self is an Importer,Fix or Exporter instance

  2. You need to create the logger yourself.

    package Foo::Bar;

    use Moo;

    with ‘Catmandu::Logger’;

    sub bar { my $self = shift; $self->log->debug(‘tadaah’); }

If you want to see the logging messages only of your package, then use a this type of line in your log4perl.conf:


or if you want to see all the log messages for Foo packages:


How to create a new Catmandu::Store

A Catmandu::Store is used to store items. Stores can have one or more compartments where to store the items. Each such compartment is a Catmandu::Bag. You can compare a Store with a database and a Bag with a table in a database. Like tables, Bags have names. When no name is provided for a Bag, then ‘data’ is used.

To implement a Catmandu store you need to create at least two packages:

  1. A ‘Catmandu::Store’, defining the general parameters, possible connection parameters and actions for the whole store.
  2. A ‘Catmandu::Bag’, which is used to list, add,fetch and delete items from a Bag.

As example, this is a skeleton for a ‘Foo’ Catmandu::Store which requires at least one ‘foo’ connection parameter:

package Catmandu::Store::Foo;
use Moo;

with 'Catmandu::Store';

has 'foo' => (is => 'ro' , required => 1);


For this Catmandu::Store::Foo we can define a module ‘Catmandu::Store::Foo::Bag’ to implement the Bag functions. Notice how in the generator the bag can access the Catmandu::Store instance:

package Catmandu::Store::Foo::Bag;
use Moo;

with 'Catmandu::Bag';

sub generator {
    my $self = shift;
    sub {
        # This subroutine is used to loop over all items
        # in a store and should return a item HASH for
        # every call
        return { 
             name => $self->name,
             foo => $self->store->foo 

sub get {
    my ($self,$id) = @_;
    # return a item HASH given an $id
    return {};

sub add {
    my ($self,$data) = @_;
    # add/update an item HASH to the bag and return the item with an _id field set
    return $data;

sub delete {
    my ($self,$id) = @_;
    # delete an item from the bag given an $id

sub delete_all {
    my ($self) = @_;
    # delete all items
    $self->each(sub {


With this skeleton Store you have enough code to run basic tests. Save these package in a lib directory:

lib/Catmandu/Store/ lib/Catmandu/Store/Foo/

and a catmandu command to test your implementation:

$ catmandu -I lib export Foo –foo bar

{“foo”:“bar”,“name”:“data”} {“foo”:“bar”,“name”:“data”} {“foo”:“bar”,“name”:“data”} . . .

Or create a script to access your new Store via Perl:

#!/usr/bin/env perl
use lib qw(./lib);
use Catmandu;

my $store = Catmandu->store('Foo', foo => 'bar');

$store->add({ test => 123});


This section will provide an in depth overview how to extend Catmandu using the API

7.1 Fix packages

Create a simple Fix

The easiest way to create a new ‘Fix’ is by creating a Perl package in the Catmandu::Fix namespace that has a ‘fix’ instance method. For example:

package Catmandu::Fix::foo;

use Moo;

sub fix {
    my ($self, $data) = @_;

    # modify your data here, for instance...
    $data->{foo} = 'bar';



When this code is available in your perl library path as Catmandu/Fix/ it can be used as fix function foo(). To try out save the file as lib/Catmandu/Fix/ in your local directory and execute:

$ echo '{}' | catmandu -I lib convert JSON --fix "foo()"

Fix creation with helper packages

The following instruction is incomplete, see POD of Catmandu::Fix

If you want pass arguments to your fix, you can make use of Moo and Catmandu::Fix::Has to read in required and optional parameters.

package Catmandu::Fix::foo;

use Moo;

has greeting => (fix_arg => 1);  # required first argument
has message  => (fix_arg => 1);  # required second argument
has eol      => (fix_opt => 1, default => sub { '!' });  # optional argument , default '!'

sub fix {
    my ($self,$data) = @_;

    $self->log->debug($self->greeting . ", " . $self->message .  $self->eol. "\n";

    # Fix your data here...



Now you can write log messages in your Fixes:

$ echo '{}' | catmandu convert --fix 'foo(Hello,World)'
Hello, World!
$ echo '{}' | catmandu convert --fix 'foo(Hello,World, eol: ?)'
Hello, World?

See also Catmandu::Fix::SimpleGetValue.

Extended introduction

For an extended introduction into creating Fix packages read the two blog posts at:

8 Contribution

This guide has been written to help anyone interested in contributing to the development of Catmandu. Please read this guide before contributing to Catmandu or related projects, to avoid wasted effort and maximizing the chances of your contributions being used.

8.1 Ways to contribute

There are many ways to contribute to the project. Catmandu is a young yet active project and any kind of help is very much appreciated!

8.1.1 Publicity

You don’t have to start by hacking the code, spreading the word is very valuable as well!

If you have a blog, just feel free to speak about Catmandu.

Of course, it doesn’t have to be limited to blogs or Twitter. Feel free to spread the word in whatever way you consider fit and drop us a line on the Catmandu user mailing list noted below.

Also, if you’re using and enjoying Catmandu, rating us on, explaining what you like about Catmandu is another very valuable contribution that helps other new users find us!

8.1.2 Mailing list

Subscribing to the mailing list and providing assistance to new users is incredibly valuable.

8.1.3 Documentation

We value documentation very much, but it’s difficult to keep it up-to-date. If you find a typo or an error in the documentation please do let us know - ideally by submitting a patch (pull request) with your fix or suggestion (see Patch Submission).

8.1.4 Code

To can contribute to Catmandu’s core code or extend the functionality by new Importers, Exporters, Stores, Fix packages, Validators, Binds, or Plugins. Have a look at the list of missing modules for existing ideas and resources for new Catmandu modules. Feel also free to add new ideas and links there.

For more detailed guidelines, see:

8.2 Quality Supervision and Reporting Bugs

We can measure our quality using the CPAN testers platform:

A good way to help the project is to find a failing build log on the CPAN testers:

If you find a failing test report or another kind of bug, feel free to report it as a GitHub issue: Please make sure the bug you’re reporting does not yet exist.


8.3.1 Website

The official website is here: A Wordpress blog with hints is available at:

8.3.2 Mailing Lists

A mailing list is available here:

8.3.3 Repositories

The official repository is hosted on GitHub at

Official developers have write access to this repository, contributors are invited to fork the dev branch (!) and submit a pull request, as described at patch submission.

8.3.4 Core Maintainers

  • LibreCat/Catmandu - @nics
  • LibreCat/Catmandu-AWS - @phochste
  • LibreCat/Catmandu-AlephX - @nicolasfranck
  • LibreCat/Catmandu-ArXiv - @pietsch, @vpeil
  • LibreCat/Catmandu-Atom - @phochste
  • LibreCat/Catmandu-BibTeX - @pietsch, @vpeil
  • LibreCat/Catmandu-Cmd-fuse - @nics
  • LibreCat/Catmandu-Cmd-repl - @pietsch
  • LibreCat/Catmandu-CrossRef @pietsch, @vpeil
  • LibreCat/Catmandu-DBI - @nicolasfranck
  • LibreCat/Catmandu-DSpace - @nicolasfranck
  • LibreCat/Catmandu-EuropePMC - @vpeil
  • LibreCat/Catmandu-Exporter-ODS - @snorri
  • LibreCat/Catmandu-Exporter-RTF - @petrakohorst
  • LibreCat/Catmandu-Exporter-Template - @vpeil
  • LibreCat/Catmandu-FedoraCommons - @phochste
  • LibreCat/Catmandu-Fix-XML - @nichtich
  • LibreCat/Catmandu-Fix-cmd - @nichtich
  • LibreCat/Catmandu-Importer-CPAN - @nichtich @phochste
  • LibreCat/Catmandu-Importer-Parltrack - @jonas
  • LibreCat/Catmandu-Inspire - @vpeil
  • LibreCat/Catmandu-LDAP - @nics
  • LibreCat/Catmandu-MARC - @phochste
  • LibreCat/Catmandu-MediaMosa - @nicolasfranck
  • LibreCat/Catmandu-OAI - @pietsch, @phochste
  • LibreCat/Catmandu-ORCID - @pietsch
  • LibreCat/Catmandu-PLoS - @pietsch, @vpeil
  • LibreCat/Catmandu-Plack-REST - @phochste
  • LibreCat/Catmandu-PubMed - @pietsch, @vpeil
  • LibreCat/Catmandu-RDF - @nichtich
  • LibreCat/Catmandu-SRU - @pietsch
  • LibreCat/Catmandu-Serializer-messagepack - @nicolasfranck
  • LibreCat/Catmandu-Serializer-storable - @nics
  • LibreCat/Catmandu-Store-CouchDB - @nics
  • LibreCat/Catmandu-Store-Elasticsearch - @nics
  • LibreCat/Catmandu-Store-Lucy - @nics
  • LibreCat/Catmandu-Store-MongoDB - @nics
  • LibreCat/Catmandu-Store-Solr - @nicolasfranck , @nics
  • LibreCat/Catmandu-Twitter - @pietsch
  • LibreCat/Catmandu-XLS - @jorol, @nics
  • LibreCat/Catmandu-Z3950 - @pietsch
  • LibreCat/Dancer-Plugin-Auth-RBAC-Credentials-Catmandu - @nicolasfranck
  • LibreCat/Dancer-Plugin-Catmandu-OAI - @nicolasfranck
  • LibreCat/Dancer-Plugin-Catmandu-SRU - @nics, phochste
  • LibreCat/Dancer-Session-Catmandu - @nics
  • LibreCat/LibreCat-Sitemap - @phochste
  • LibreCat/MODS-Record - @phochste
  • LibreCat/Plack-Session-Store-Catmandu - @nics
  • LibreCat/Task-Catmandu - @nics
  • LibreCat/WWW-ORCID - @nics

8.4 Acknowledgement

This guide was based on .

8.5 Development Setup

The following guidelines describe how to set up a development environment for contribution of code.

8.5.1 Set up a development environment

If you want to submit a patch for Catmandu, you need git and very likely also milla (). We also recommend perlbrew (see below) to test and develop Catmandu on a recent version of perl. We also suggest ) to quickly and comfortably install perl modules under perlbrew.

In the following sections we provide tips for the installation of some of these tools together with Catmandu. Please also see the documentation that comes with these tools for more info.

Perlbrew tips (Optional)

Install perlbrew for example with

cpanm App::perlbrew

Check which perls are available

perlbrew available

At the time of writing it looks like this


Then go on and install a version inside Perlbrew. I recommend you give a name to the installation (--as option), as well as compiling without the tests (--n option) to speed it up.

perlbrew install -n perl-5.16.3 --as catmandu_dev -j 3

Wait a while, and it should be done. Switch to your new Perl with:

perlbrew switch catmandu_dev

Now you are using the fresh Perl, you can check it with:

which perl

Install cpanm on your brewed version of perl.

perlbrew install-cpanm

8.5.2 Install dependencies (required)

this section needs to be rewritten to reflect the change to Dist::Milla

8.5.3 Get Catmandu sources

Get the Catmandu sources from github (for a more complete git workflow see below):

Clone your fork to have a local copy using the following command:

$ git clone

The installation is then straight forward:

$ cd Catmandu
$ perl Build.PL
$ ./Build
$ ./Build test
$ ./Build install

You can now start with hacking Catmandu and patch submission!

8.6 Coding guidelines

The following guidelines are no strict rules but they should be considered as best practice for contribution.

8.7 Compatibility

Catmandu should be able to install for all Perl versions since 5.10.1, on any platform for which Perl exists. We focus mainly on GNU/Linux (any distribution).

You should avoid regressions as much as possible and keep backwards compatibility in mind when refactoring. Stable releases should not break functionality and new releases should provide an upgrade path and upgrade tips such as warning the user about deprecated functionality.

8.8 Code documentation

Document your module with

  • a meaningful abstract
  • a SYNOPSIS with usage example
  • a short DESCRIPTION giving an introduction, including explicit links to other modules (e.g. roles)
  • a CONFIGURATION section listing all constructor arguments
  • a METHODS section listing all public methods. Methods derived from other modules should not be included but the modules should be mentioned explicitly.
  • a SEE ALSO section listing related modules

Names of other moduless should be linked (e.g. L<Catmandu::Importer>)

8.9 Patch Submission

The Catmandu development team uses GitHub to collaborate. We greatly appreciate contributions submitted via GitHub, as it makes tracking these contributions and applying them much, much easier. This gives your contribution a much better chance of being integrated into Catmandu quickly!

To help us achieve high-quality, stable releases, git-flow workflow is used to handle pull-requests, that means contributors must work on their dev branch rather than on their master. (Master should be touched only by the core dev team when preparing a release to CPAN; all ongoing development happens in branches which are merged to the dev branch.)

Here is the workflow for submitting a patch:

  1. Fork the repository (click “Fork”)

  2. Clone your fork to have a local copy using the following command:

    $ git clone git://$myname/Catmandu.git
  3. As a contributor, you should always work on the dev branch of your clone (master is used only for building releases).

    $ git remote add upstream
    $ git fetch upstream
    $ git checkout -b dev upstream/dev

This will create a local branch in your clone named dev and that will track the official dev branch. That way, if you have more or less commits than the upstream repo, you’ll be immediately notified by git.

  1. You want to isolate all your commits in a topic branch, this will make the reviewing much easier for the core team and will allow you to continue working on your clone without worrying about different commits mixing together.

To do that, first create a local branch to build your pull request:

    # you should be in dev branch here
    git checkout -b pr/$name

Now you have created a local branch named pr/$name where I<$name> is the name you want (it should describe the purpose of the pull request you’re preparing).

  1. In that branch, do all the commits you need (the more the better) and when done, push the branch to your fork:

    # … commits … git push origin pr/$name

You are now ready to send a pull request.

  1. Send a pull request via the GitHub interface. Make sure your pull request is based on the pr/$name branch you’ve just pushed, so that it incorporates the appropriate commits only.

It’s also a good idea to summarize your work in a report sent to the users mailing list (see below), in order to make sure the team is aware of it.

When the core team reviews your pull request, it will either accept (and then merge into dev) or refuse your request.

If it’s refused, try to understand the reasons explained by the team for the denial. Most of the time, communicating with the core team is enough to understand what the mistake was. Above all, please don’t be offended.

  1. If your pull-request is merged into dev, then all you have to do is to remove your local and remote pr/$name branch:

    git checkout dev
    git branch -D pr/$name
    git push origin :pr/$name
  2. And then, of course, you need to sync your local dev branch with the upstream:

    git pull upstream dev
    git push origin dev

You’re now ready to start working on a new pull request!