Linked Open Data for Public Procurement
Abstract. Public procurement is an area that could largely beneﬁt from linked open data technology. The respective use case of the LOD2 project covered several aspects of applying linked data on public contracts: ontological modeling of relevant concepts (Public Contracts Ontology), data extraction from existing semi-structured and structured sources, support for matchmaking the demand and supply on the procurement market, and aggregate analytics. The last two, end-user oriented, functionalities are framed by a speciﬁcally designed (prototype) web application.
Public Procurement Domain
Among the various types of information produced by governmental institutions as open data, as obliged by the law, are descriptions of public contracts, both at the level of requests for tenders (RFT, also 'calls for bids' or the like)—open invitations of suppliers to respond to a deﬁned need (usually involving precise parameters of the required product/s or service/s)—and at the level of awarded contract (revealing the identity of the contractor and the ﬁnal price). The whole process is typically denoted as public/government procurement. The domain of public procurement forms a fundamental part of modern economies, as it typically accounts for tens of percents of gross domestic product. Consequently, due to the volume of spending ﬂows in public procurement it is a domain where innovation can have signiﬁcant impact. Open disclosure of public procurement data also improves the transparency of spending in the public sector.
An interesting aspect of public contracts from the point of view of the semantic web is the fact that they unify two diﬀerent spheres: that of public needs and that of commercial oﬀers. They thus represent an ideal meeting place for data models, methodologies and information sources that have been (often) independently designed within the two sectors. Furthermore, the complex life cycle of public contracts gives ample space for applying diverse methods of data analytics, ranging from simple aggregate statistics to analyses over complex alignments of individual items. On the other hand, using linked data technology is beneﬁcial for the public contract area since it allows, among other, to increase interoperability across various formats and applications, and even across human language barriers, since linked data identiﬁers and vocabularies are language-independent.
As three major views of the e-procurement domain we can see those of domain concepts, data and user scenarios. Plausible and comprehensive conceptualization of the domain is a prerequisite for correct design of computerized support as well as for ensuring data interoperability. Management of the large amounts of data produced in the procurement domain has to take into account its varying provenance and possibility of duplicities and random errors. Finally, the activities of users, i.e., both contract authorities and bidders/suppliers, along the diﬀerent phases of the public contract lifecycle, have to be distinguished. Linked data technology provides a rich inventory of tools and techniques supporting these views. The last, user-oriented view is least speciﬁc of the three; typically, the user front-end does not diﬀer much from other types of (web-based) applications, except that some functionality, such as autocompletion of user input, exhibits online integration to external linked data repositories.
Public procurement domain has already been addressed by projects stemming from the semantic web ﬁeld. The most notable ones are probably LOTED and MOLDEAS . LOTED focused on extraction of data from a single procurement source, simple statistical aggregations over a SPARQL endpoint and, most recently, legal ontology modeling . MOLDEAS, in turn, primarily addressed the matchmaking task, using sophisticated computational techniques such as spreading activation  and RDFized classiﬁcations. However, the eﬀort undertaken in the LOD2 project is unique by systematically addressing many phases of procurement linked data processing (from domain modeling through multi-way data extraction, transformation and interlinking, to matchmaking and analytics) as well as both EU-level and national sources with diverse structure.
The chapter structure follows the above views of public procurement. First, the Public Contract Ontology (PCO) is presented, as a backbone of the subsequent eﬀorts. Then we review the original public contract data sources that have been addressed in our project, and describe the process of their extraction, cleaning and linking. Finally, the end user's view, in diﬀerent business scenarios, supported by a Public Contract Filing Application (PCFA for short) is presented. It is further divided into the matchmaking functionality and the analytic functionality (the full integration of the latter only being in progress at the time of writing the chapter).