An Integration Layer for Rijksmuseum Collection Data

By Chris Dijkshoorn

The Rijksmuseum’s Collection IT department is responsible for collection data infrastructure and manages the applications that support museum processes revolving around all kinds of collection objects. The department aims to improve access to data about the collection by integrating data from different applications and enabling the contextualisation of artworks. The Collection IT team consists of two application managers, a data engineer, a manager and occasional in-sourced staff as well. It is one of the five subdepartments of the Research Services department within the sector of Collections.

The team has been working on the design of a loosely coupled new collection data architecture, together with the consulting firm ValueBlue, specialised in integration architecture. The overal new collection data architecture is in line with the principles of the Dutch Digital Heritage Reference Architecture (DERA) and the goals of making data Findable, Accessible, Interoperable and Reusable (FAIR). The integration layer is a key part of the new architecture: a combination of digital communication protocols, data models and a metadata repository.

Why the Rijksmuseum needs an integration layer

There are three main reasons:

Improve the integration of collection data from different systems
Offer a wide range of data services that enable telling stories about the collection
Stabilise the connections between systems by standardising data interchange

At the Rijksmuseum, we use different systems to capture information and data about our collection. For example, the Collection IT department manages the open source Koha library system, the collection management system Adlib and links to the digital asset management system MediaBin. Steps are being taken to make other domains digitally available as well, such as research data, object documentation and archival collections. This results in a diverse landscape of systems, with data which is often complementary.

To offer a better user experience, data from multiple systems needs to be integrated. For example, one system contains information about artworks by Rembrandt and another books about Rembrandt. While we provide online access to data from both systems via the discovery system Rijksstudio and the online library catalog, there is no integrated approach, yet. Data is accessed using different interfaces, exported in different formats and there would be no way to tell that “Rembrandt van Rijn” in our collection management system is the same as “Rembrandt, 1606-1669” in our library system.

Integrating data is a good start, but in order to provide value to users it needs to be published in different ways in an open way online, so that users can experience and create stories about objects themselves. That this is a shared vision at the Rijksmuseum was beautifully illustrated in the talk “Creating narratives from diverse data” by Erma Hermens, Rijksmuseum Chair for Studio Practice and Technical Art History at the University of Amsterdam, at the 2019 conference Sharing Is Caring X Amsterdam. The goal of the Collection IT department is to create stable, sustainable and open data services that enable others to tell rich stories.

Ill. 1: Computercentrum Amro-bank (1), Michael Pellanders, 1982. Photograph, 50,5 × 34 cm. Rijksmuseum Amsterdam, NG-1986-6-21. PURL: http://hdl.handle.net/10934/RM0001.COLLECT.347085

RijksData was launched last year and provides an overview of the data services the Rijksmuseum currently offers. The listed APIs have been available since 2013, run in the infrastructure of the website and are specifically made for providing data to the Rijksstudio interface. This makes it harder for other developers to build meaningful applications. We aim to gradually expand the number of data services and move them into the museum’s digital infrastructure. Our involvement in the Linked Art initiative provides us with a platform to pursue standardisation between institutions. This allows us to offer a wide array of services, that use standardised data models and protocols.

To improve stability and allow transparent migration to new systems, our internal infrastructure also benefits from standardising internal communication between systems. Many of the systems that the Rijksmuseum uses are tightly coupled. Instability of one system impacts other systems as well. Replacing a system requires unpicking the multitude of connections made to other systems. Functionality to integrate data and interact with it in meaningful ways will benefit the organisation as well as the public.

Design of the integration layer

Principles of FAIR, Linked Open Data and DERA inspired the design of the Rijksmuseum integration layer. The design comprises eight functional levels with repeatable patterns for individual domains (e.g. collection, library, archive), that I will discuss in more depth below:

The authoritative sources are systems that manage data for a specific domain. Examples are the library system, the document management system and the collection management system. As soon as new systems or other domains become available, they can be added to the architecture by reusing existing patterns. A deliberate design choice is to not make the integration layer a source on its own: new data needs to be entered in authoritative sources. If none is suitable, a new authoritative source will need to be introduced.

The metadata extraction and binary data extraction layer relates to adapters on top of the authoritative sources that extract data. These components have to be tailormade for the individual authoritative sources. When possible, a push mechanism is used, so that when data is updated in an authoritative source, it is synchronised with the integration layer. Obtained metadata is forwarded to the next layer: metadata standardisation.

The metadata standardisation layer converts metadata structured according to the data models used by the authoritative sources into standardised data models. The output will be formatted according the Resource Description Framework (RDF). We aim to capture relevant details, by structuring the data using data models developed for the domain in question, for example Linked Art for collection metadata which the Rijksmuseum team is co-developing as well.

The metadata validation checks whether the standardised data adheres to a set of formulated constraints. To do so, we aim to use the Shapes Constraint Language (SHACL). At the moment data is badly formatted or elements are missing, a report is saved in the metadata storage.

The metadata storage layer is a triple store, that contains all relevant metadata of the authoritative sources. Integration of data is enabled by the data format (RDF) and the use of shared persistent identifiers.

The metadata analysis layer includes daemons, processes that continuously monitor the integration layer. An example of a daemon is a data quality monitor, that shows the validation results stored in the triple store in a user interface, informing information specialists about data conflicts which must be resolved in the authoritative sources.

The data sharing layer will be the public-facing part of the integration layer, offering a wide range of different data services. These data services include APIs and downloads of data dumps. New services will be listed on data.rijksmuseum.nl.

The identifier resolution layer provides persistent identifier resolution. The Rijksmuseum now uses Handle identifiers for collection objects, but the intention is to move to generic Rijksmuseum identifiers for all relevant domains.

Conclusion

Implementing the integration layer is a challenging project and requires the Collection IT department to extend its focus from application management to software development as well. It involves integrating data from different systems with different data models, requiring tailored adaptors and conversions. A roadmap has been formulated spanning three years for this implementation, starting in 2020. But the goal is to implement functional services quickly and already provide value to users in an early stage of development.

The success of the integration layer depends on a collaborative effort. The team has to rely on the data of the source systems, created through processes from all over the museum. It also involves deploying the loosely coupled services in a cloud infrastructure managed by the museum IT department, who is responsible for hardware, systems- and network management. And to demonstrate the usefulness of the integration layer, the collaboration with the Digital Marketing team who is responsible for the design of the website of the Rijksmuseum is crucial as well because they will be able to highlight the features of the integrated data by telling rich stories online. Together and through co-operation, this project will empower the Rijksmuseum’s wish to tell data-driven, polyphonic stories with its collections for as much people as possible.