Modelling for the Specific Domain

In order to transform the existing model of an enterprise to a more interoperable schema, best practices focus on the use of common vocabularies. Using terms of existing vocabularies is easier for the publisher and contributes a lot in the re-use and the seamless information exchange of enterprise data.

As a first step, the inherent structure of the legacy data has to be analysed. If no specified hierarchy exists, it can often be created based on expert knowledge of the data. If such an organization of the data is not possible, then only a list of concepts, basically a glossary, can be constructed. Depending on the complexity of the data and how the entities are related, different data schemas can be used to express them.

Migration of Legacy Vocabularies

The migration of an existing vocabulary to an rdf scheme varies in complexity from case to case, but there are some steps that are common in most situations. Transforming enterprise data to rdf requires:

Translating between the source model and the rdf model is a complex task with many alternative mappings. To reduce problems, the simplest solution that preserves the intended semantics should be used.

The basic entity of rdf is a resource and all resources have to have a unique identifier, a uri in this case. If the data itself does not provide identifiers that can be converted to URIs, then a strategy has to be developed for creating uri for all the resources that are to be generated (see Sect. 5.4).

Preserve original naming as much as possible. Preserving the original naming of entities results in clearer and traceable conversions. Prefix duplicate property names with the name of the source entity to make them unique.

Use xml support for data-typing. Simple built-in xml Schema datatypes such as xsd:date and xsd:integer are useful to supply schemas with information on property ranges.

Themeaning of aclassorproperty can beexplicatedbyadding an “rdfs:comment”, preferably containing a definition from the original documentation. If documentation is available online, “rdfs:seeAlso” or “rdfs:isDefinedBy” statements can be used to link to the original documentation and/or definition.

Domain specific data, can be modelled with vocabularies like Org[1] or GoodRelations[2]. Only when existing vocabularies do not cover ones needs new schemas should be developed. Data sets that will be published on the web should be described with metadata vocabularies such as VoiD, so that people can learn what the data is about from just looking at its content.

Where suitable vocabularies to describe the business data do not exist, one possibility is to develop a skos thesaurus instead of an rdfs model (e.g. taxonomies, organizations, document types). This approach is easier to follow for organisations new to rdf. Tools such as PoolParty[3] exist and support users in such a task. The most recent international standard regarding thesaurus development is the ISO 25964[4]. This standard provides detailed guidelines and best practices that interested readers should consider.

Once the data is in this format it can be loaded in a triple store like Virtuoso and published internally or on the web.

  • [1]
  • [2]
  • [3]
  • [4]
< Prev   CONTENTS   Next >