Serbian Statistical Office Use Case

The information published by the Statistical Office of the Republic of Serbia (SORS)[1] on monthly, quarterly and yearly basis is mostly available as open, downloadable, free of charge documents in PDF format, while raw data with short and long-term indicators is organized in a central statistics publication database. The SORS publication list includes press releases, a monthly statistical bulletin, statistical yearbook, working documents, methodologies and standards, trends, etc. Serbia's national statistical office has shown strong interest in being able to publish statistical data in a web-friendly format to enable it to be linked and combined with related information. A number of envisioned main actors,

Fig. 3. Simplified workflow for an end-user case

and a sample scenario, were used to elaborate the requirements for the Linked Open Data tools to be included in the Statistical Workbench.

The SORS data publishing process with the Linked Open Data extension is shown in Fig. 3. The data prepared by a Statistician are published through the SORS Dissemination database. Using the LOD2 Statistical Workbench, reports can be transformed into a machine processable format and published to a local governmental portal (e.g. the Serbian CKAN). The IT Administrator maintains the necessary infrastructure for storing and publishing statistical data in different formats (Excel, XML, RDF). Public data are retrieved by a Data analyst that wants to use the data in his research.

An in-depth analysis of the SORS dissemination database has shown that there are a number of standard dimensions that are used to select and retrieve information. The Linked Data principles suggest modeling these dimensions as code lists in accordance with the recommendation for publishing RDF data using the RDF Data Cube vocabulary. In order to formalize the conceptualisation of each of the domains in question, the Simple Knowledge Organisation System (SKOS) was used. The concepts are represented as skos:Concept and grouped in concept schemes that serve as code lists (skos:ConceptScheme) the dataset dimensions draw on to describe the data (Fig. 4).

As the direct central database access is restricted, all input data is provided as XML files. The SORS statistical data in XML form is passed as input to the Statistical Workbench's built-in XSLT (Extensible Stylesheet Language Transformations) processor and transformed into RDF using the aforementioned vocabularies and concept schemes. Listing 1 shows an example RDF/XML code snippet from a transformed dataset.

Fig. 4. Representing SORS code lists

Listing 1. Sample observation from a SORS dataset in RDF/XML syntax

The above example describes the setup, the overall process and the outcome of the use case. It shows how (local) raw statistical data can be moved to (globally visible) rich collections of interrelated statistical datasets. A future-proof novel method is used for data representation, compatible with international statistical standards. The resulting data relies on both international and domestic code lists, allowing for easy comparison, interlinking, discovery and merging across different datasets. Finally, the results are cataloged in a local metadata repository, and periodical harvesting at an international level is scheduled, thereby increasing transparency and improving public service delivery, while enriching the Linked Data Cloud.

  • [1]
< Prev   CONTENTS   Next >