Modelling for the Specific Domain
In order to transform the existing model of an enterprise to a more interoperable schema, best practices focus on the use of common vocabularies. Using terms of existing vocabularies is easier for the publisher and contributes a lot in the re-use and the seamless information exchange of enterprise data.
As a ﬁrst step, the inherent structure of the legacy data has to be analysed. If no speciﬁed hierarchy exists, it can often be created based on expert knowledge of the data. If such an organization of the data is not possible, then only a list of concepts, basically a glossary, can be constructed. Depending on the complexity of the data and how the entities are related, diﬀerent data schemas can be used to express them.
Migration of Legacy Vocabularies
The migration of an existing vocabulary to an rdf scheme varies in complexity from case to case, but there are some steps that are common in most situations. Transforming enterprise data to rdf requires:
• Translating between the source model and the rdf model is a complex task with many alternative mappings. To reduce problems, the simplest solution that preserves the intended semantics should be used.
• The basic entity of rdf is a resource and all resources have to have a unique identiﬁer, a uri in this case. If the data itself does not provide identiﬁers that can be converted to URIs, then a strategy has to be developed for creating uri for all the resources that are to be generated (see Sect. 5.4).
• Preserve original naming as much as possible. Preserving the original naming of entities results in clearer and traceable conversions. Preﬁx duplicate property names with the name of the source entity to make them unique.
• Use xml support for data-typing. Simple built-in xml Schema datatypes such as xsd:date and xsd:integer are useful to supply schemas with information on property ranges.
• Themeaning of aclassorproperty can beexplicatedbyadding an “rdfs:comment”, preferably containing a deﬁnition from the original documentation. If documentation is available online, “rdfs:seeAlso” or “rdfs:isDefinedBy” statements can be used to link to the original documentation and/or deﬁnition.
Domain speciﬁc data, can be modelled with vocabularies like Org or GoodRelations. Only when existing vocabularies do not cover ones needs new schemas should be developed. Data sets that will be published on the web should be described with metadata vocabularies such as VoiD, so that people can learn what the data is about from just looking at its content.
Where suitable vocabularies to describe the business data do not exist, one possibility is to develop a skos thesaurus instead of an rdfs model (e.g. taxonomies, organizations, document types). This approach is easier to follow for organisations new to rdf. Tools such as PoolParty exist and support users in such a task. The most recent international standard regarding thesaurus development is the ISO 25964. This standard provides detailed guidelines and best practices that interested readers should consider.
Once the data is in this format it can be loaded in a triple store like Virtuoso and published internally or on the web.