After conversion of source data to data cube we still have just detailed data. In order to show some useful statistics, and in particular to visualise in CubeViz, we need to provide slices and aggregations. Such artefacts need to be modelled and included along with RDF data.
In order to generate slices we provided a Python script for materialisation of datasets. It ﬁrst queries for a list of dimensions enumerated in data structure deﬁnition (DSD). Then, all elements (members) of dimensions are retrieved. We assume that slice contains two dimensions (for use in two-dimensional charts), therefore pairwise combination of all dimensions is generated. Finally, respective links between observations, datasets and slices are generated.
Listing 3 presents relevant parts of our generated data. In this example, a slicekey with ﬁxed dimensions for indicator and unit is deﬁned – slicekey-indicator-unit. Using this structure, several slices are generated, one for each combination of free variables. One of them is the slice with values ﬁxed on 'export' (indicator ) and 'EUR' (unit ) – export-EUR. The last line by qb:observation contains ellipsis because there are in fact 1330 observations attached.
Listing 3. Data cube clices in INSIGOS data
Aggregations are commonly used in reporting using multidimensional data. The most prominent example is a data warehouse with OLAP cubes. Aggregations are selected and calculated in such a way that speeded up reporting is possible. By analogy, aggregations may be deemed also useful for cubes deﬁned as linked data.
In our case aggregation is necessary for drill-down operations. For example, daily data can be aggregated on a monthly basis to better observe a phenomenon. Also, we can display data in yearly sums, and allow drill-down to be even more precise. SPARQL is capable of calculating sums on-the-ﬂy, but it takes time and sometimes time-out is reached. Materialisation is then necessary for quicker operations.
Our ﬁrst idea was to prepare aggregations using Python script, similar to slicing. That would require too much querying and would be ineﬃcient. In the end, we found a way to implement the method for aggregation as a set of SPARQL queries.
One of the issues was generation of URIs for new observations as aggregation is in fact a new observation – the same dimensions but values are on higher level, e.g. month → year. For INSIGOS/POLGOS observations we have deﬁned a pattern for identiﬁers. We used the capabilities of Virtuoso to generate identiﬁers directly in SPARQL.
Before aggregation is done, a correct hierarchy should be prepared. The prerequisite for the script is that dimensions are represented as SKOS concept scheme, and elements of dimension are organised in hierarchy with skos:narrower property.
-  A mechanism should be introduced to allow querying for observations instead of explicit assignments. Currently, such assignments require materialisation of a big number of additional triples, which makes solution questionable in enterprise data warehouse settings when considering the volume of data
-  For example, a query calculating the value of public procurement contracts by voivodeships takes 100 seconds, which is outside of acceptable response times