LibreCat/Catmandu Events

LibreCat members offer Training and Workshops on a regular basis.


Upcoming Events

Workshop: Catmandu - Extract, Load, Transform
InetBib-ODOK-Tagung, February 21-23 2018
Vienna, Austria

by Johann Rolschewski (Berlin State Library), Vitali Peil (Bielefeld University Library)

Description

Für eine Vielzahl von digitalen Diensten und Dienstleistungen wie Discovery Systeme, Electronic Resource Management, Forschungsdatenmanagement oder Digitalisierung müssen Bibliotheken heutzutage Metadaten in verschiedenen Formaten, z.B. Excel, KBART, MAB2, MARC 21, MODS und PICA, über diverse Schnittstellen (OAI, SPARQL, SRU, Z39.50, etc.) beziehen, aufbereiten und bereitstellen. Diese sich wiederholenden Prozesse des Datenmanagements werden auch als "Extract, Transform, LOAD" (ETL) [https://de.wikipedia.org/wiki/ETL-Prozess] bezeichnet: Extraktion der Daten aus verschiedenen Quellen, Transformation in ein geeignetes Schema und Laden in ein Zielsystem. Catmandu [http://librecat.org/Catmandu/], ein 'data processing toolkit', unterstützt die Standardisierung dieser Prozesse mit Softwaremodulen etwa für gängige bibliotheksspezifische Datenformate, Datenbanksysteme und Schnittstellen. Das dreistündige "Hands-On Lab" bietet eine Einführung in die Funktionalitäten von Catmandu mit praktischen Übungen. Ein Schwerpunkt liegt dabei auf der effizienten Arbeit mit dem command line interface [http://librecat.org/Catmandu/#command-line-client], das den schnellen Zugriff auf die vielen Funktionalitäten des toolkits ermöglicht. Die Teilnehmer*innen lernen, selbständig ETL-Prozesse mit Catmandu auszuführen und so von der fix language [http://librecat.org/Catmandu/#fix-language] für die Transformation und Normalisierung von Metadaten zu profitieren. Die Veranstaltung richtet sich an Datenmanager*innen, Entwickler*innen und Systembibliothekar*innen. Voraussetzung sind grundlegende Kenntnisse im Umgang mit der Kommandozeile und bibliotheksspezischen Datenformaten. Für die Übungen ist ein eigener Laptop mit der Software "VirtualBox" [https://www.virtualbox.org/wiki/Downloads] erforderlich. Eine "virtuelle Maschine" (VM) mit allen benötigten Software-Modulen wird vom Vortragender zum Download bereitgestellt.


Past Events

Hands-On Lab: Catmandu - Extract, Load, Transform
106. Bibliothekartag
Frankfurt am Main, Germany, June 1 2017

by Johann Rolschewski (Berlin State Library)

Description

Für eine Vielzahl von digitalen Diensten und Dienstleistungen wie Discovery Systeme, Electronic Resource Management, Forschungsdatenmanagement oder Digitalisierung müssen Bibliotheken heutzutage Metadaten in verschiedenen Formaten, z.B. Excel, KBART, MAB2, MARC 21, MODS und PICA, über diverse Schnittstellen (OAI, SPARQL, SRU, Z39.50, etc.) beziehen, aufbereiten und bereitstellen. Diese sich wiederholenden Prozesse des Datenmanagements werden auch als "Extract, Transform, LOAD" (ETL) [https://de.wikipedia.org/wiki/ETL-Prozess] bezeichnet: Extraktion der Daten aus verschiedenen Quellen, Transformation in ein geeignetes Schema und Laden in ein Zielsystem. Catmandu [http://librecat.org/Catmandu/], ein 'data processing toolkit', unterstützt die Standardisierung dieser Prozesse mit Softwaremodulen etwa für gängige bibliotheksspezifische Datenformate, Datenbanksysteme und Schnittstellen. Das dreistündige "Hands-On Lab" bietet eine Einführung in die Funktionalitäten von Catmandu mit praktischen Übungen. Ein Schwerpunkt liegt dabei auf der effizienten Arbeit mit dem command line interface [http://librecat.org/Catmandu/#command-line-client], das den schnellen Zugriff auf die vielen Funktionalitäten des toolkits ermöglicht. Die Teilnehmer*innen lernen, selbständig ETL-Prozesse mit Catmandu auszuführen und so von der fix language [http://librecat.org/Catmandu/#fix-language] für die Transformation und Normalisierung von Metadaten zu profitieren. Die Veranstaltung richtet sich an Datenmanager*innen, Entwickler*innen und Systembibliothekar*innen. Voraussetzung sind grundlegende Kenntnisse im Umgang mit der Kommandozeile und bibliotheksspezischen Datenformaten. Für die Übungen ist ein eigener Laptop mit der Software "VirtualBox" [https://www.virtualbox.org/wiki/Downloads] erforderlich. Eine "virtuelle Maschine" (VM) mit allen benötigten Software-Modulen wird vom Vortragender zum Download bereitgestellt.


SWIB16 - Workshop: Catmandu & Linked Data Fragments
Bonn, Germany, November 28-30 2016

by Patrick Hochstenbach (Ghent University Library) / Carsten Klee (Berlin State Library) / Johann Rolschewski (Berlin State Library)

Description

"Catmandu" (http://librecat.org/index.html) is a command line tool to access and convert data from your digital library, research services or any other open data sets. The "linked data fragments" (LDF) project (http://linkeddatafragments.org/) developed lightweight tools to publish data on the web using the Resource Description Framework (RDF). In combination both projects offer an easy way to transform your data to RDF and provide access via a graphical user interface (GUI) and application programming interface (API). We will present all required tools at the workshop. The participants will be guided to transform data to RDF, to host it with a LDF server and to run SPARQL [https://www.w3.org/TR/rdf-sparql-query/] queries against it. The participants should install a virtual machine (VM) as an development environment on their laptops, see (https://librecatproject.wordpress.com/2014/12/01/day-1-getting-catmandu/) for further information. Audience: Systems librarians, Metadata librarians, Data manager. Expertise: Participants should be familiar with command line interfaces (CLI) and the basics of RDF.


ELAG 2016 - Bootcamp: In the Beginning ... Was the Command Line
Copenhagen, Denmark, June 5 2016

by Johann Rolschewski, Berlin State Library / Patrick Hochstenbach, Ghent University Library

Description

Command Line Interfaces (CLI) and tools were the primary utilities for interaction with computer systems and programs until the introduction of the Graphical User Interfaces (GUI). For many tasks they still excel GUI programs: you can process very large files, you can redirect the output of one command line tool into another, chaining them together to resolve complex or repetitive tasks. This workshop will focus on beginner and intermediate uses of the CLI: organizing files and directories, processing data, interacting with Web Application Programming Interfaces (API). Beside the standard UNIX utilities we will use tools like `catmandu` (data processing toolkit), `csvkit` (utilities for converting to and working with CSV), `jq` (lightweight and flexible command-line JSON processor), `XMLStarlet` (command line XML Toolkit) and `YAZ` (toolkit for Z39.50/SRW/SRU protocols and MARC records).


ELAG 2016 - LibreCat: Transforming an Institutional Repository
Copenhagen, Denmark, June 6-9 2016

by Petra Kohorst & Vitali Peil , Bielefeld University Library

Description

By developing a completely new repository software Bielefeld University Library gave its current system the last EXIT. The main reasons for abandoning the old system were its inherently complex data structures, its lack of performance and its highly time-consuming maintenance. Nevertheless, Bielefeld University Library powered (and still powers), as far as we know, the first institutional repository in Germany for research data and publications. The reasons for developing a completely new system were the general need for agile development, a new and more clearly structured architecture, better performance - and proving that one can build real world applications with Catmandu (https://github.com/LibreCat/Catmandu). Actually, Catmandu is used for all ETL processes within this application.

Besides the technical part this talk will mention the high costs of EXITing some known environment and ENTERing a new one. Migrating systems almost always unveils some bad surprises, but it is a good opportunity to clean up your DATA and transform it for your future needs. Not only does this affect the current users but also the library staff, not to mention the developers who need to transcend boundaries by rethinking every part of the software.


KIM Workshop 2016 - Catmandu & Linked Data Fragments
Mannheim, Germany, April 4-5 2016

by Carsten Klee & Johann Rolschewski, Berlin State Library

Description
See https://dini.de/veranstaltungen/workshops/kim2016/

Catmandu Hackathon
Berlin, Germany, April 11-12 2016

by Johann Rolschewski, Berlin State Library

Description

Dates:
Monday 11 April 12:00 - 17:00
Tuesday 12 April 09:00 - 13:00
To participate send an email to Johann Rolschewski Johann.Rolschewski@sbb.spk-berlin.de. Limited seats available.


Code4Lib 2016 - Workshop: Catmandu - a (meta)data toolkit
Philadelphia, USA, March 7 2016

by Patrick Hochstenbach & Nicolas Steenlant, Ghent University Library

Description
See http://2016.code4lib.org/workshops/Catmandu-a-metadata-toolkit

Research Data, Librarians and Libraries: Use-cases and Approaches/LibreCat Developer Workshop 2015
Lund, Sweden, December 2–3 2015

by Lund University Library / LibreCat

Description

Lund University Library invites you to an afternoon where the common theme will be research data and how different libraries approach the new tasks that come with the increasing interest in making research data available and reusable. All presentations will be held in English.

Myriam Mertens, research data manager at Ghent University will talk about "Research Data at Ghent University".

Najko Jahn, project and innovation manager at Bielefeld University Library, will talk about implementing research data management services at Bielefeld.

Jörgen Eriksson, librarian at Lund University Library, will talk about open data/trusted data.

Maria Johnsson, librarian at Lund University Library will talk about the findings of a project on research data management at the University Library last year, and about the actions the University Library are taking upon the results.

Anthony Leroy, application developer at Université Libre de Bruxelles will present SAFE-PLN in his talk "SAFE PLN in a Nutshell".

After talks and on the following day there will be a workshop for participants in LibreCat where developers and librarians will discuss the current development efforts and new areas of cooperation.

Time and Place


December 2 - Wednesday
13:00-16:00 Presentations
Location: Bromansalen, Lund University Library, Helgonabacken, Lund, Sweden - Map
16:00-18:00 LibreCat Workshop, part 1
Location: Lund University Library

December 3 - Thursday
09:00-12:00 LibreCat Workshop, part 2
Location: Lund University Library
12:00-13:00 Lunch
13:00-16:00 LibreCat Workshop, part 3
Location: Lund University Library
Registration

Please email to : <snorri dot briem at ub dot lu dot se> to register.


SWIB15 - Workshop: Catmandu - a (meta)data toolkit
Hamburg, Germany, November 23-25 2015

by Johann Rolschewski, Berlin State Library / Vitali Peil, Bielefeld University Library / Patrick Hochstenbach, Ghent University Library

Description

See http://swib.org/swib15/programme.html for the conference website.


TPDL 2015 - Tutorial: Catmandu - a (meta)data toolkit
Poznań, Poland, September 14 2015

by Nicolas Steenlant / Patrick Hochstenbach, Ghent University Library, Belgium

Description

See http://tpdl2015.info/tutorials-list/tutorial-catmandu-metadata-toolkit/ for the conference website.

Slides. The exercises will follow.


ELAG 2015 - Bootcamp: Catmandu - a (meta)data toolkit
Stockholm, Sweden, June 8 2015

by Johann Rolschewski, Berlin State Library / Vitali Peil, Bielefeld University Library

Description

See http://elag2015.org/program/catmandu-a-metadata-toolkit/ for the conference website.


Memento Hackathon 2015
Ghent University March 9–10 2015

by Ghent University Library / LibreCat / SAFE-PLN

Description

The Memento Hackathon is a event to learn about web archiving and long term preservation strategies. During these two days, Ghent University invited Dr. Herbert Van de Sompel, Los Alamos National Laboratory, as speaker about his Memento web archiving project: http://timetravel.mementoweb.org/. The trainer for this hackathon will be Harihar Shankar, Los Alamos National Laboratory, an international expert on application development and the Memento framework.

The Hackathon will bring together digital library enthusiasts, programmers and web archiving specialists to discuss long term preservation strategies and get hands on experience with Memento.

To share code, documentation and information we created a github organization to share our knowledge. Join https://github.com/MementoHackathon2015.

This hackathon is organized in colaboration with Ghent University Library, LibreCat and SAFE-PLN. We kindly thank Ghent University Library for their sponsorship.

Where is the Hackathon?

Ghent University Library
Rozier 9000
GENT
Belgium

When is the Hackathon?

March 9–10, 2015. These days include networking, hacking and a workshop. The slides for this hackathon are available here.

What do you need to sign up?

Interest in library programming, long term preservation and web archiving. Library technologists are welcome and there is no entrance fee for the event. Please send and email to : <patrick dot hochstenbach at ugent dot be> to register


Catmandu workshop
VLIR-UOS Antwerpen December 11 2014

by Nicolas Steenlant / Patrick Hochstenbach, Ghent University Library, Belgium

Conference Website
http://www.vliruos.be/en/project-funding/calls-for-applications/calldetail/call-for-participation-in-workshops-on-library-and-information-management_6145/
Description

Goal of the workshops is to bring together the expertise and experience of librarians and information specialists in Flanders and the South. During the workshops the participants will define the tools and the best practices in 'information discovery' and 'information literacy' approaches. It will result in toolkits for the participants and for other universities and institutes in the South and will be developed as an Internet resource


Catmandu – Importing, transforming, storing and indexing data should be easy
SWIB2014 Bonn December 1–3 2014

by Johann Rolschewski / Jakob Voß, Staatsbibliothek zu Berlin, Germany / Verbundzentrale des GBV (VZG), Germany

Conference Website
http://swib.org/swib14/programme.php
Description

Catmandu provides a suite of software modules to ease the import, storage, retrieval, export and transformation of metadata records. Combine Catmandu modules with web application frameworks such as PSGI/Plack, document stores such as MongoDB and full text indexes such as Elasticsearch to create a rapid development environment for digital library services.

After a short introduction to Catmandu and its features, we will present the domain specific language (DSL) and command line interface (CLI). Participants will be guided to transform (their) data records to a common metadata model, to store/index it in Elasticsearch or MongoDB and to export it as Linked Data.

Prior Experience: We will be using a simplified DSL language. Participants should be familiar with command line interfaces (CLI). Any programming experience is welcome but not required. A brief tutorial on Catmandu programming can be found here.

Requirements: Laptop with VirtualBox installed. Organisers will provide a virtualbox image (Linux guest system) beforehand. Participants should bring their own data (CSV, JSON, MAB2, MARC, PICA+, RDF or YAML).


Catmandu – the data toolkit
Österreichischer Perl Workshop 2014 Salzburg October 10–13 2014

Workshop Website
APW 2014

Talk by Johann Rolschewski: "Catmandu – the data toolkit"

Catmandu – the data toolkit
16. Deutscher Perl-Workshop 2014 Hannover March 26–28 2014

Workshop Website
GPW 2014

Talk by Johann Rolschewski: "Catmandu – the data toolkit"


Catmandu – Hackathon
Bielefeld University May 8–9 2014

Hackathon Website
https://github.com/LibreCat/librecat.github.io/wiki/8-9-May-2014-Catmandu-Hackathon
Description

In May 2014 we will meet at Bielefeld University for a Catmandu hackathon. We will present a short state of the project and explain some of the new features that are available. The goal of this meeting is to work together on some open issues for the 1.0 release of the code.

Catmandu – Library Oriented Extract, Transform, Load tools to publish Linked Open Data
SWIB2013, Workshop November 25–27 2013, Hamburg

by Nicolas Steenlant / Patrick Hochstenbach / Najko Jahn Bielefeld University, Germany / Ghent University, Belgium

Conference Website
http://swib.org/swib13/programme.php
Description

Creating any data oriented application the main task is to import data from various sources, map the fields to a common data model and put it all into a database or search engine. In data-warehousing, these processes are called ETL — Extract, Transform, Load. Catmandu provides a suite of Perl modules to ease the import, storage, retrieval, export and transformation of metadata records

After a short introduction in the rationales of Catmandu and presentation of sample applications at the Universities of Lund, Ghent and Bielefeld, participants will be guided to transform MARC records to Linked Data. The steps include transforming MARC into a JSON model of choice, storing/indexing the model in ElasticSearch, and exporting/mapping the model as Linked Data.

Prior experience: We will be using a simplified ETL language. Any programming experience is welcome but not required. A brief tutorial on Catmandu programming can be found here. Requirements: Laptop with VirtualBox installed. Organisers will prepare a virtualbox image (Linux guest system) beforehand to be worked with during the workshop.

Slides


An Open Access workflow to Extract, Transform, Map, and Publish dynamic metadata
ELAG 2013, Workshop May 29–30 2013, Gent

by Miel Vander Sande, Pieter Colpaert, Erik Mannens (Inspired and assisted by: Patrick Hochstenbach, Dries Moreels)

Conference Website
http://elag2013.org/ws3-an-open-access-workflow-to-extract-transform-map-and-publish-dynamic-metadata/
Description

LibreCat is an open collaboration to provide freely available tools for library and research services. It allows a librarian to define a “menu” which can be repeated for dataset extraction, transformation and loading. The DataTank is an Open Source data adapter for publishing Open Data sets.The DataTank is a RESTful data publishing tool. By daisychaning LibreCat and The DataTank’s Input project, we can now also map these data to an ontology and publish the data in a RESTful interface.

The workshop will go deeper into the latter: an ontology will be chosen, a mapping file will be created and a recipe will be scheduled. The data ingested in the triple store (a data base for semantically enriched data), will then be published through a RESTful interface.

Intended Audience

Librarians who want the data they are managing lifted towards linked open data.

Experience

Knowledge about catmandu/librecat is a plus.


Catmandu: boost your data processing with library oriented ETL
ELAG 2013 Bootcamp May 28 2013, Gent

by Nicolas Steenlant

Conference Website
http://elag2013.org/bc3-catmandu-boost-your-repository-services-with-library-oriented-etl-processing/
Description

To create any data oriented application, one of your recurring tasks will be to import data from various sources, map the fields to a common data model and put it all into a database or search engine.

Stores such as MongoDB or ElasticSearch provide a developer friendly API, but you keep writing a lot of boilerplate or throwaway code. We tried to abstract this problem into a set of Perl tools called Catmandu which can work with library data such as MARC, Dublin Core, EndNote, protocols such as OAI-PMH, SRU and repositories such as DSpace and Fedora.

In data warehouses these processes are called ETL, Extract, Transform, Load. Many (often heavyweight) tools exist for ETL processing but none address typical library data models and services.

In this bootcamp we will provide an introduction into these tools. We will show how easy it is to import data and transform it with the help of a small DSL language. Storing and indexing become one-liners.

Audience

Developers, sysadmins

Expertise

Importing metadata from various sources, transforming this data into a JSON model of choice, storing/indexing in a (search) engine of choice, provide a REST based API.

Programming experience

Scripting languages of choice Perl, Python, Ruby, PHP.

Required

Laptop with GNU/Linux or OSX or a Virtual Machine.


PubLister/LibreCat Software Developer Workshop
29 and 30 November 2012 – Bielefeld University

by LibreCat and PubLister team

Conference Website
https://github.com/LibreCat/Catmandu/wiki/Bielefeld-Workshop-November-29-30-2012
Description

We would like to invite you to the joint PubLister/LibreCat Software Developer Workshop on 29 and 30 November 2012 (1–7pm ; 9am–2pm CEST) at Bielefeld University.

Building off last year's PubLister Symposium and Workshop http://pub.uni-bielefeld.de/workshop/, the University Libraries of Lund, Gent and Bielefeld share the common vision of

  1. Creating a high-level system of building blocks that can be reused when creating repository-like applications: Project Catmandu
  2. Creating a next-generation repository service based on these building blocks: Project LibreCat.

http://librecat.org/

After showcasing exemplary implementations at the three Universities, the workshop will get you started building repository applications with Catmandu. The sources needed are distributed both via CPAN http://search.cpan.org/search?m=all&q=catmandu and GitHub https://github.com/LibreCat. A brief tutorial can be found here:

http://librecat.org/tutorial/index.html

Funding Acknowledgement

The workshop takes place at the Center of Excellence – Cognitive Interaction Technology (CITEC) at Bielefeld University. We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG).


Bridging the Gap between Personal Publication Lists and Institutional Services – PubLister Symposium and Workshop
23 and 24 March 2011 – Bielefeld University

by PubLister team

Conference Website

http://pub.uni-bielefeld.de/workshop/

Description

The event introduces the history, rationales and the future of the repository software SBCAT, which has been developed at Lund University Libraries and Ghent University Library. Adopted and customised at Bielefeld University Library, PUB has been the official research database of Bielefeld University since November 3rd 2010.

PUB provides Bielefeld University faculty with a single entry-point to register personal publications in order to promote their research within and across the university, e.g. on the Directory of Staff and Departments, personal webpages and those of departments. Up to now, more than 12,500 records have been registered. For projects and research groups, registration and flexible web-presentations will be launched soon.

At the Symposium, researchers describe from a first-hand perspective their needs for maintaining publication records and documents. With the CRC PrePrint Server, the PubLister project has served the Bielefeld Collaborative Research Centre SFB 673 Alignment in Communication to disseminate its publication-based research output to the funder during a successful mid-term evaluation. Recently, the SFB Preprint Server has been adopted by another SFB based at Bielefeld University.

With PhilLister, the PubLister project has developed a lightweight service that allows flexible storage and dissemination of philosophical papers on personal and departmental webpages.

In joint collaboration with the Cognitive Interaction Technology – Center of Excellence (CITEC), strategies are developed to share and synchronize publication and research information with solutions already existing at a central scientific institute of Bielefeld University (i.e. Drupal Biblio).

Integrating diverse data sources has been one of the main challenges during the project. With regard to this, the section "Data" firstly analyses vocabularies needed for enriching publication records with research information available at an academic institution such as data about persons, organisation, projects, and events. An approach to identify authors and organisations on the basis of the literature databases "Scopus" and "ISI Web of Science" as part of the "German Competence Centre for Bibliometrics" will be introduced.

The second day starts with the presentation of alternative repository-based approaches. For instance, the deployment of EPrints The University of Regensburg Publication Server will be presented just as PUMA – Academic Publication Management developed at the University of Kassel, which is based on Bibsonomy technology.

In the following, three workshops allow for a more detailed discussion and sharing of experiences. The workshop "PUB Live Demo" aims at discussion about the software developed and in use. The workshop "Workflows" deals with supporting strategies for populating institutional repositories and supporting researchers. "Applied Metadata Programming and Modelling" invites developers interested in the broader context of academic publications.

Funding Acknowledgement

We acknowledge the support of the Deutsche Forschungsgemeinschaft (DFG).