Supporting the Linked Data Life Cycle Using an Integrated Tool Stack
Abstract. The core of a Linked Data application is the processing of the knowledge expressed as Linked Data. Therefore the creation, management, curation and publication of Linked Data are critical aspects for an application's success. For all of these aspects the LOD2 project provides components. These components have been collected and placed under one distribution umbrella: the LOD2 stack. In this chapter we will introduce this component stack. We will show how to get access; which component covers which aspect of the Linked Data life cycle and how using the stack eases the access to Linked Data management tools. Furthermore we will elaborate how the stack can be used to support a knowledge domain. The illustrated domain is statistical data.
Introduction
Publishing Linked Data requires the existence of management processes that ensure the quality. The management process passes through several stages; in the Linked Data life cycle the main stages are ordered in their typical application order. The starting point is very often the extraction stage in which data from the source format is turned into RDF. The extracted RDF formatted data must be stored in an appropriate storage medium, making the data available for further processing. At this moment the data is ready to be queried and can be manually updated to correct small mistakes. Within the linking stage the data is enriched by interconnecting the data with external data sources. These data linkages create new opportunities: the data can now be classified according to the external data; information that is spread over two entities can be fused together, . . . All these data manipulations can be monitored with quality metrics. When the desired data quality is reached the data can be made public and be explored by end-user applications.
Of-course the world is ever changing and hence data will reflect this. Therefore, there is support for the evolution of the data from one structure into another.
For all these stages research institutes and companies around the world have created tools. At the start of the LOD2 project these tools were scattered around the Linked Data community. Specialists in the area shared lists of components in various degree of completeness. The LOD2 project had the ambition to start a platform in which all Linked Data components were collected. This common distribution platform was called the LOD2 stack, and will continue to exist after the LOD2 project has finished as the Linked Data stack[1]. Components in the stack are easy to install and directly usable. Moreover, they come with preconfigured setups that make the interplay between them easier. These additional features cannot be offered by the individual component owners but requires central coordination.
In the first part of this chapter, the LOD2 stack is elaborated in more detail. The second part is dedicated to the specialization of the LOD2 stack for statistical data. Indeed the LOD2 stack is in its own right is not dedicated towards a particular use case. For particular kinds of data, such as statistical data, the components of the stack can be further specialized and pre-configured to offer a much better dedicated end user support.
- [1] stack.linkeddata.org