Linked Open Data for Public Procurement

Abstract. Public procurement is an area that could largely benefit from linked open data technology. The respective use case of the LOD2 project covered several aspects of applying linked data on public contracts: ontological modeling of relevant concepts (Public Contracts Ontology), data extraction from existing semi-structured and structured sources, support for matchmaking the demand and supply on the procurement market, and aggregate analytics. The last two, end-user oriented, functionalities are framed by a specifically designed (prototype) web application.

Public Procurement Domain

Among the various types of information produced by governmental institutions as open data, as obliged by the law, are descriptions of public contracts, both at the level of requests for tenders (RFT, also 'calls for bids' or the like)—open invitations of suppliers to respond to a defined need (usually involving precise parameters of the required product/s or service/s)—and at the level of awarded contract (revealing the identity of the contractor and the final price). The whole process is typically denoted as public/government procurement. The domain of public procurement forms a fundamental part of modern economies, as it typically accounts for tens of percents of gross domestic product.[1] Consequently, due to the volume of spending flows in public procurement it is a domain where innovation can have significant impact. Open disclosure of public procurement data also improves the transparency of spending in the public sector.[2]

An interesting aspect of public contracts from the point of view of the semantic web is the fact that they unify two different spheres: that of public needs and that of commercial offers. They thus represent an ideal meeting place for data models, methodologies and information sources that have been (often) independently designed within the two sectors. Furthermore, the complex life cycle of public contracts gives ample space for applying diverse methods of data analytics, ranging from simple aggregate statistics to analyses over complex alignments of individual items. On the other hand, using linked data technology is beneficial for the public contract area since it allows, among other, to increase interoperability across various formats and applications, and even across human language barriers, since linked data identifiers and vocabularies are language-independent.

As three major views of the e-procurement domain we can see those of domain concepts, data and user scenarios. Plausible and comprehensive conceptualization of the domain is a prerequisite for correct design of computerized support as well as for ensuring data interoperability. Management of the large amounts of data produced in the procurement domain has to take into account its varying provenance and possibility of duplicities and random errors. Finally, the activities of users, i.e., both contract authorities and bidders/suppliers, along the different phases of the public contract lifecycle, have to be distinguished. Linked data technology provides a rich inventory of tools and techniques supporting these views. The last, user-oriented view is least specific of the three; typically, the user front-end does not differ much from other types of (web-based) applications, except that some functionality, such as autocompletion of user input, exhibits online integration to external linked data repositories.

Public procurement domain has already been addressed by projects stemming from the semantic web field. The most notable ones are probably LOTED[3] and MOLDEAS [1]. LOTED focused on extraction of data from a single procurement source, simple statistical aggregations over a SPARQL endpoint and, most recently, legal ontology modeling [5]. MOLDEAS, in turn, primarily addressed the matchmaking task, using sophisticated computational techniques such as spreading activation [2] and RDFized classifications. However, the effort undertaken in the LOD2 project is unique by systematically addressing many phases of procurement linked data processing (from domain modeling through multi-way data extraction, transformation and interlinking, to matchmaking and analytics) as well as both EU-level and national sources with diverse structure.

The chapter structure follows the above views of public procurement. First, the Public Contract Ontology (PCO) is presented, as a backbone of the subsequent efforts. Then we review the original public contract data sources that have been addressed in our project, and describe the process of their extraction, cleaning and linking. Finally, the end user's view, in different business scenarios, supported by a Public Contract Filing Application (PCFA for short) is presented. It is further divided into the matchmaking functionality and the analytic functionality (the full integration of the latter only being in progress at the time of writing the chapter).

  • [1] For example, as of 2010 it makes up for 17.3 % of the EU's GDP [8]
  • [2] See, e.g., stopsecretcontracts.org/
  • [3] loted.eu/
 
< Prev   CONTENTS   Next >