Difficulties and possible solutions
There are several difficulties relative to knowledge management in the domain of cultural heritage. We categorize these difficulties into four main points:
- - data acquisition;
- - knowledge modeling;
- - use;
- - interoperability.
In fact, data acquisition is an important stage in documenting heritage data for all cultural heritage institutions like museums, libraries and archives (GLAMS). These institutions continuously perform diverse data acquisition tasks to add new objects to their collections and enrich the data on the objects that are already part of the collections. Several research projects on these heritage objects produce an enormous quantity of data that should be organized to facilitate accessibility, management and interoperability. The traditional systems for acquiring heritage data depend on the relational databases and graphic interface elements to insert data as form fields. The data on each heritage object are relative to the diverse aspects that go beyond the object itself and that includes its context, such as people, locations, events and other objects linked to its history.
In a general way, the data on a cultural heritage object can be classed into two main categories [HOH 12]:
- - data concerning the identification of the object, like its username, title, creator, etc.;
- - data concerning its documentation, like its history, production, conservation, archaeological digs, etc. These data, focused on the events relative to the object, are richer and more difficult to model than the identification data.
The use of semantic web technology for data acquisition is an interesting approach because it provides, from the beginning, a rich ontological description, an elevated level of interoperability, machine readability, better automatic processing and conformity with the standards of the semantic web without additional effort. Furthermore, the nature of the information needs in the field of cultural heritage is very complex because the events, locations, people and material and immaterial objects must be described. The semantic web offers techniques and resources that would allow this complexity to be dealt with.
There are, however, several issues to consider concerning the use of semantic web technology for data acquisition, e.g. data entry by experts in a domain (cultural heritage) by means of a form-type web interface is very frequent and also makes the use of a system easier than direct access to the complexity of the knowledge base [SCH 12].
Experts in the field do not have to be directly exposed to the complexity of the ontologies and technologies of knowledge representation [MAZ 12]. The users are more comfortable with the informal and implicit semantics used within the graphic interface, such as the labels attached to form fields [HOH 12], the grouping and order in which these labels are used, and the connections between forms.
The following characteristics describe the interaction between the users and knowledge management systems:
- - context helps the users understand the semantics desired by labels, for example, by associating certain labels with a particular subject, idea, or semantic aspect. [EPP 06];
- - there is no separation between concepts and instances [SAR 07];
- - there is a tendency to use the concrete instances and concepts associated with the clear images in the user’s mind [SCH 11, SAR 07];
- - there is a tendency to use complementary resources like URLs, images, colors, etc. [SIO 11];
- - users are not able to think in terms of the explicit semantics that we find in ontologies [EPP 06].
In fact, the semantic web is based on the externalization of implicit knowledge in an explicit and formal way, which is very clear in the definition of the ontology [BOR 97]. Consequently, a direct link must be created between the implicit and explicit semantics. This link corresponds to a pathway inside the ontology [NUS 07]. An ontological pathway is a sequence of classes and properties organized in a certain way to represent certain semantics explicitly [AMA 15].
Returning to the example of the Mona Lisa, an ontological pathway to represent a painter (Leonardo da Vinci) who made a painting (the Mona Lisa) is clarified with classes, properties and the following order:
Figure 6.7. An example of an ontological pathway
NOTE.- This ontological pathway offers much richer and more precise semantics than
The CIDOC CRM ontology is structured into several functional divisions that facilitate knowledge modeling and the creation of ontological pathways. The following example shows the ontological pathways from the functional division “Material and Technical Information” [CRO 11].
Figure 6.8. Example of functional divisions and ontological pathways
Certain research projects [FRI 12, VRA 14] allow users to insert data using form fields that are associated with explicit semantics using object- oriented ontologies and metadata schemata. These systems, however, are not aimed at data acquisition using event-oriented ontologies like CIDOC CRM, especially in a rich and complex field like cultural heritage, where each form field must be linked to an ontological pathway. To use these systems with ontologies like CIDOC CRM, several technical adaptations must be made [KIM 15].
[SCH 12] is a research project that allows ontological pathways to be defined by an administrator who has sufficient knowledge of semantic technologies and ontological modeling. In such an environment, each form field is associated with an ontological pathway predefined by the administrator. The inserted data become instances for the classes used in each ontological pathway.
In the framework of our research, we give the user the chance to participate in knowledge externalization using a tool that we developed and call “Path Finder” to add and structure cultural heritage data, which gives him the chance to represent his information needs dynamically.
The user interacts with the tool without being directly exposed to the ontology through the use of an intuitive terminology that we automatically classified according to the CIDOM CRM ontology and that we use as an entry for our graph traversal algorithm [AMA 15]. The tool automatically generates the ontological pathway that represents the semantics desired by the user and also creates the new form field that becomes associated with this pathway.
As we saw earlier in this chapter, users are more comfortable with the object-oriented approach, and for this reason, we ask the user to specify the object he is describing (the first concept on the pathway) and the characteristic of this object that interests him (the final concept on the pathway) using intuitive vocabularies (concrete concepts or even instances) that we provide using domain vocabularies and external knowledge bases.
For example, using a special form field, the user can search in real time (using Ajax technology) the concept of “painting” (recovered using Wordnet) or even the instance “Mona Lisa” (recovered using DBpedia), and our automatic classification algorithm [AMA 15], which we describe briefly in section 184.108.40.206, classifies this choice under the appropriate class in the CIDOC CRM ontology. For the last concept, the user can choose “painter” or even “Leonardo da Vinci”. The pathway produced, however, only includes the corresponding classes from the CIDOC CRM ontology to be generalizable for all instances that are semantically compatible.
Our tool calculates all the intermediary elements of the pathway (including concepts like events and relationships between its concepts). The following figure shows part of the tool’s graphic interface.
Figure 6.9. Graphic interface of the “Path Finder” tool
Technically speaking, our recursive algorithm proceeds from the current concept towards all those possible relationships (where the current concept is rdfs:domain), including those inherited from its parents, and then towards all the possible concepts of these relationships (where the concept is
rdfs:range). We continue recursively repeating the same procedure until we find all the possible pathways that do not exceed a predefined length. Throughout the procedure, we filter the pathways so that we do not accept the pathways that enter a loop.
To disambiguate the pathways in order to choose the pathway desired by the user, we use the intermediary concepts as a key for semantic disambiguation. As illustrated in the example below, we ask the user if he is talking about the “Production” or the “Modification” of the painting. This method of creating ontological pathways, guided by programming, is both more intuitive and more efficient than the manual method.
The idea of ontological pathways is used in several data acquisition systems, such as in the BRICKS project [NUS 07], where there was a need for mapping between an object-oriented metadata schema and the CIDOC CRM ontology to ensure better semantic interoperability with other systems in the domain. In this project, mapping in CIDOC CRM was always an ontological pathway that includes certain classes and properties in a particular order corresponding to a specific element in an old schema. Furthermore, it was necessary to make several technical adaptations because the old schema was intended for a relational database, unlike the CIDOC CRM ontology that is better adapted to the semantic web and the RDF graph model.
The two kinds of users capable of providing semantic data for a heritage knowledge base are experts in the domain and hobbyists [SMI 12]. The domain experts are historians, curators, archivists, librarians, etc. The hobbyists are interested users like antique collectors. The kind of user must be taken into consideration when conceptualizing the heritage information system. We discuss this aspect in section 6.3.3.
It is important to distinguish real objects from their digital representations, such as photos, scanned versions, audio files, videos, etc. These files are normally stored using physical supports like computer hard drives. Furthermore, heritage objects can be collected, studied and analyzed by individuals and institutions, which produces a large amount of knowledge on these objects.
This knowledge may reside in the minds of individuals or even be documented in a structured, semi-structured or unstructured way in several resources like books, scientific publications, databases, documentations, manuals, websites, etc.
The extraction of structured information from a text can be realized using various natural language processing (NLP) technologies and machine learning [AKK 11]. Furthermore, ontology-based information extraction uses ontologies to guide the various information-extraction tasks. These tasks include named-entity recognition (NER), the extraction of relationships between entities, and domain vocabulary extraction using textual literature from the field.
The extracted vocabulary includes the representative terms from a specific domain. This stage is important for associating the instances and terms from a domain with the general concepts from the domain. [VLA 12] is a doctoral thesis that exploits natural language processing technologies and terminological resources in the domain of cultural heritage to automatically annotate words and phrases mentioned in the text with certain concepts.
The text chosen to test the system comes from the domain of archaeology, and the semantic annotation is based on the CIDOC CRM ontology and its CRM-EH extension for the domain of archaeology. Technically speaking, the annotation task uses the JAPE rules that are manually predefined. JAPE is an annotation system that uses specific rules provided by GATE software. These rules are combined with domain terms found in the EH (English Heritage) thesaurus as well as mechanisms for the detection of negation and lexical disambiguation.
The system allows the text to be annotated with certain CIDOC CRM classes like E19_Physical Object, E49_Time_Appellation, E53_Place and E57_Material, and certain CRM-EH extension classes like EHE0007_ Context (the location where the archaeological object was found), EHE0009_Context_Find (the object found) and EHE0030_Context_Find_ Material (the material observed in the object found). This search is part of the STARproject that aims to integrate heterogeneous information from various resources in the archaeological domain.
[HER 08] is another example of data acquisition, but using structured and semi-structured sources like websites that include semantic descriptions, databases, Excel files and library data described with the MARC 21, DC and EAD schemata. This data acquisition task aims to automatically fill the ontology with instances from these sources in a traceable and reproducible way. Two stages are essential to achieve this goal:
- - the interpretation and extraction of data: with the help of XML rules to describe the meaning of each element in the source schemata. These XML rules offer mapping with certain CIDOC CRM ontology classes or with an ontological pathway according to the semantics and functionality of each element. This stage achieves the identification and extraction of data described by the source schemata;
- - the insertion of instances into the ontology: in this stage, the data extracted during the previous stage are inserted into the ontology as instances. Domain experts can verify, complete and modify the data inserted.