Data Sources Identification
Corporate information can be deﬁned as the data that is used and shared by the diﬀerent employees, departments, processes (it or not) of a company. Depending on the information security policy, corporate data can be accessed, processed and published via diﬀerent business applications of the enterprise it system. Note that it may be spread across diﬀerent locations (internal departments and entities, regional or cross countries subsidiaries, etc.).
When integrating LOD technologies into an existing enterprise it system or application, the ﬁrst recommendation is to perform an audit on the diﬀerent business data sources used by the company. This audit should include the following elements:
• Classiﬁcation of business data sources according to their importance to the operation of strategic business processes.
• Cartography of data workﬂow between the identiﬁed data sources to discover missing, redundant or incomplete information exchanged, the type of data (structured, unstructured), etc.
• Mapping table between native business data formats and the corresponding standard formats (preferably w3c rdf like formats) and the impact from shifting from the native to the standard format.
This audit allows the data architects to better understand the corporate applications' functioning and help them evaluating the cost of integrating LOD technology. According to the required eﬀort and cost, the ﬁrst best practice consists on migrating as much as possible native formats to standards, preferably rdf-like w3c standards when possible. This considerably eases the publishing, annotation and interlinking of business data.
To comply with the openness criterion of LOD paradigm, publishing data is a major recommendation in the “LODiﬁcation” process of corporate data. To do so, a licensing scheme must be released to deﬁne how the opened data can be reused and exploited by third-party users, applications and services. Considering the company interest, a compromise must be found to open as much data as possible and maintaining a good balance between keeping strategic enterprise data conﬁdential, like the know-how for example, and the rest of data open. Lot of reusable licensing schemes can be considered.
Last but not least, the opened data license scheme must guarantee the reuse principle of data by third-party applications with as few technical, ﬁnancial and legal restrictions as possible. One way of achieving these goals is to provide rich metadata descriptions of the opened data with appropriate vocabularies, like DCAT, VoID, DublinCore, etc. To make the opened and published data understandable and retrievable, the metadata description must provide key elements like the copyright and associated license, update frequency of data, publication formats, data provenance, data version, textual description of the data set, contact point when necessary to report inconsistencies or errors for example, etc.