Interlinking The Creation of 5-Star Business Data

5-Star business data[1] refers to Linked Open Data, the 5 stars being:

1. data available on the web with an open-data license,

2. the data is available in a machine readable form,

3. the machine readable data is in a non-proprietary form (e.g. CSV),

4. machine readable, non-proprietary using open standards to point to things,

5. all the above, linked with other data providing context.

To get the full benefits of linked data with the discovery of relevant new data and interlinking with it, requires the 5th star, but that does not mean that benefits are not derived from the Linked Data approach before that point is reached. A good starting point can be business registers such as Opencorporates[2] or

Fig. 10. 5 star data

UK Companies House[3] that contain the metadata description of other companies. The discovery of more related business data can further be facilitated with Linked Data browsers and search engines like SigmaEE[4]. However, the implementation of interlinking between different data sources is not always a straightforward procedure. The discovery of joint points and the creation of explicit rdf links between the data in an automated way can be supported with tools both included in the Interlinking/Fusion LOD2 life cycle.

The process that is referred to as interlinking is the main idea behind the Web of Data and leads to the discovery of new knowledge and their combinations in unforeseen ways. Tools such as silk[5] offer a variety of metrics, transformation functions and aggregation operators to determine the similarity of the compared rdf properties or resources. It operates directly on sparql endpoints or rdf files and offers a convenient user interface namely Silk Workbench.

Vocabulary Mapping

Sometimes, an enterprise may need to develop a proprietary ontology when applying Linked Data principles. Mapping the terms that were used for publishing the triples with terms in existing vocabularies will facilitate the use of the enterprise data from third-party applications. A tool that supports this kind of mapping is r2r[6].

r2r searches the Web for mappings and apply the discovered mappings to translate Web data to the application's target vocabulary. Currently it provides a convenient user interface that facilitates the user in a graphical way to select input data from a sparql endpoint as well as from rdf dumps, create the mappings and write them back to endpoints or rdf files.

Conclusion

In this chapter, we discussed the best practices to deploy in an enterprise application to ensure a full LOD paradigm compliant semantic dataflow. We also saw that deploying LOD tools and procedures does not necessary requires to start the IT design from scratch but can be deployed on top of existing applications. This guarantees low cost deployment and integration.

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

  • [1] 5stardata.info/
  • [2] https://opencorporates.com/
  • [3] companieshouse.gov.uk/
  • [4] sig.ma
  • [5] lod2.eu/Project/Silk.html
  • [6] wifo5-03.informatik.uni-mannheim.de/bizer/r2r/
 
< Prev   CONTENTS   Next >