Data Graph Summary Model
In general, an RDF graph consists of datasets which in turn contain a number of entities. These entities are organised into classes. Links can exist at any of these levels; either between datasets, between class of entities or between the entities themselves. The data graph summary is a meta-graph that highlights the structure of a data graph (e.g. RDF).
For the graph summary process, we need to represent the data graph using three conceptual layers: the dataset layer, the node collection layer and the entity layer. The entity layer represents the original data graph. The node collection layer captures the schema and structure of the data graph in a concise way by grouping similar entities into a parent node that we call a node collection. This grouping is required as it allows for the graph summary to correctly determine collection speciﬁc information about those entities. The dataset layer captures the link structure across datasets as well as the provenance of the information on the entity and node collection layers. The Fig. 12 gives an example of the three layer representation of a data graph. Note that the β symbol represents terminating or leaf entities, e.g., RDF literal values. The node collection layer represents a summary computed by grouping together entities having the same classes. The node collection layer is composed of node collections and linksets, i.e., a set of links having the same labels between two node collections. For example, in the ﬁgure the links “author” between articles and people on the entity layer are mapped to two linksets “author” on the node collection layer.
A data graph summary provides a unique view on the linkage information of a dataset. Using this meta-graph, it is possible to analyse the links of a dataset from the “point of view” of said dataset: links from/to other datasets, internal links between classes, etc. The Web Linkage Validator shown in Fig. 13a presents various “point of views” for the purpose of aiding the owners of datasets in the assessment of their linked data.
Besides giving a structural breakdown of the dataset, the graph summary is a utility for validating the internal and external links of a particular graph of data. In terms of external links, it shows what a dataset is “saying” about other datasets and vice-versa. This is important as it gives knowledge base owners the ability to validate what the links represent.