There are several means to access the data: browsing the BZP portal, subscription mechanism with some restricted number of criteria, and the download of XML ﬁles, which we employed in the RDFization. The structure of XML is basically ﬂat: even though some attributes can be grouped that are put on the same level. This has implications for the parsing and conversion mechanisms. On the one hand, no subset of XML data can be selected for further processing. On the other hand, the extraction expressions as well as XML paths are shorter. Conversion of XML ﬁles containing notices about public contracts has been carried out by means of Tripliser. The RDFization had to overcome some issues in the XML structure, such as the use of consecutive numbers for elements describing the individual suppliers (in Polish ''wykonawca') awarded the diﬀerent lots of a contract: wykonawca 0, wykonawca 1, wykonawca 2 and so on. We also had to write our own extension functions for Tripliser allowing us to generate new identiﬁers for addresses, as data structures, from their parts: locality, postal code and street.
Automatic linking, using Silk as one of the LOD2 stack tools, was carried out for the problem of mapping the contact information of a given contracting authority or supplier to a classiﬁcation of Polish territorial units called TERYT.
The dataset was created by combining data from two complementary sources: USASpending.gov and Federal Business Opportunities (FBO). USASpending.gov oﬀers a database of government expenditures, including awarded public contracts, for which it records, e.g., the numbers of bidders. On the other hand, FBO publishes public notices for ongoing calls for tenders. USASpending.gov provides data downloads in several structured data formats. We used the CSV dumps, which we converted to RDF using SPARQL mapping executed by tarql. Data dump from FBO is available in XML as part of the Data.gov initiative. To convert the data to RDF we created an XSLT stylesheet that outputs RDF/XML. As an additional dataset used by both USASpending.gov and FBO, we converted the FAR Product and Service Codes to RDF using LODReﬁne, an extraction tool from the LOD2 Stack.
Data resulting from transformation to RDF was interlinked both internally and with external datasets. Internal linking was done in order to fuse equivalent instances of public contracts and business entities. Deduplication was performed using the data processing unit for UniﬁedViews that wraps the Silk link discovery framework. The output links were merged using the data fusion component of UniﬁedViews. Links to external resources were created either by using codebased URI templates in transformation to RDF or by instance matching based on converted data. The use of codes as strong identiﬁers enabled automatic generation of links to FAR codes and North American Industry Classiﬁcation System 2012, two controlled vocabularies used to express objects and kinds of public contracts. Instance matching was applied to discover links to DBpedia and OpenCorporates. Links to DBpedia were created for populated places referred to from postal addresses in the U.S. procurement dataset. Furthermore, OpenCorporates was used as target for linking the bidding companies. The task was carried out using the batch reconciliation API of OpenCorporates via interface in LODReﬁne.
-  uzp.gov.pl
-  uzp.gov.pl/BZP/
-  A Java library and command-line tool for creating triple graphs from XML, https:// github.com/daverog/Tripliser
-  See Chap. 1 of this book
-  stack.linkeddata.org
-  teryt.stat.gov.pl/
-  usaspending.gov/
-  https://fbo.gov/
-  https://github.com/opendatacz/USASpending2RDF
-  https://github.com/cygri/tarql
-  ftp://ftp.fbo.gov/datagov/
-  https://github.com/opendatacz/FBO2RDF
-  acquisition.gov/
-  code.zemanta.com/sparkica/
-  wifo5-03.informatik.uni-mannheim.de/bizer/silk/
-  Developed previously for ODCleanStore, the predecessor of UniﬁedViews 
-  census.gov/eos/www/naics/index.html
-  dbpedia.org
-  https://opencorporates.com/