GEOSS Information System and the Big Data Challenges

Big Data topic immediately emerges when considering large and heterogeneous EO systems. The GEOSS information system (and the GCI) goals pose challenges along all the Big Data dimensionalities (Nativi et al. 2015). Each Big Data challenge (commonly known as 'V' axes: volume, variety, velocity, veracity, visualization) required the GEOSS and the GCI to devise and operate ad hoc solutions and strategies (Nativi et al. 2015). This may be considered as the third important GCI evolution, while the second one is the adoption of the brokering approach.

Presently, the GEOSS information system adopts a fully brokering approach implementation, building on cloud computing technology: GCI (i.e., its three main components) moved in the Cloud, realizing a public cloud-based software ecosystem that characterizes the present GEOSS information systems. Community application developers can join this software ecosystem by using the GCI/dAB cloud-based APIs to develop new applications and community-driven portals.

For example, the presently operational DAB configuration takes advantage of the following cloud-specific elements (Nativi et al. 2015):

FIGURE 14.4

GCI contribution to the GEOSS information system.

Routing service: This is a Domain Name System service, specifically tailored to be used in cloud environments, that is, the cloud routing service can use cloud-specific functionality to optimize traditional routing functionalities (e.g., route requests to healthy machines).

Load balancer: Provides external client applications with a unique entry point and routes each request to the machine with the lowest workload.

Instance: A virtual machine provisioned by the cloud provider.

Clone instance cluster: A set of instances, every instance in the cluster is assumed to be running the same application with the same configuration (they are clones).

Auto-scaling cluster: An instance cluster that is able to add/remove instances on-the-fly according to a set of scaling rules.

Cluster: A set of instances, every instance can be running different applications with different configurations.

The main solutions and strategies, adopted by the GEOSS information system (and GCI), for addressing Big Data challenges were analyzed by Nativi et al. (2015); in particular, Table 14.1 provides an executive summary of them.

GEOSS Information Systems and GCI Strategies and Solutions to Address Big Data Challenges

TABLE 14.1

Big Data Challenges

Solutions Adopted to Address the Challenges

Volume

Discovery challenges

High number of catalogs, inventory, listing services to be brokered

Large number of metadata records

Large number of users' discovery requests

Reduce the number of matching results, by supporting advanced constraints in addition to the more traditional "what," "where," and "when."

Design and apply a ranking metrics and related paging strategy.

Support distributed queries, along with the harvesting approach, to reduce the number of large metadata records to be stored and managed by the DAB.

Use of load balancing and auto-scaling clusters to support a large number of queries.

Access challenges High number of data services to be brokered

Large amount of datasets Big data volume Large number of users' access requests

Use of server-side transformation functionalities to limit downloaded data.

Supplement missing transformation functionalities (not supported by data servers).

Support data caching and map tiling.

Use of load balancing and auto-scaling clusters.

Variety

Discovery challenges Support of highly heterogeneous metadata models and discovery service interfaces Publication of the set of metadata models and discovery interfaces implemented by GEOSS users' applications Long-term data access sustainability in a multidisciplinary environment

Introduction of a brokering tier dedicated to mediation of service interfaces and metadata models harmonization in a transparent way for both users and data providers.

Design and implementation of a brokering semantic and metadata model used.

Extensible architecture of brokering to support new service interfaces and metadata models.

(Continued)

Comprehensive and Coordinated Approach

393

GEOSS Information Systems and GCI Strategies and Solutions to Address Big Data Challenges

TABLE 14.1 (Continued)

Big Data Challenges

Solutions Adopted to Address the Challenges

Access challenges

Support of highly heterogeneous data models, encoding formats, and access service interfaces Publication of the set of data models, encoding format, and access interfaces implemented by GEOSS users' applications Long-term data access sustainability in a multidisciplinary environment

Introduction of a brokering tier dedicated to mediation of access service interfaces and data formats harmonization in a transparent way for both users and data providers.

Design and implementation of a brokering data model used to (i) harmonize and integrate the heterogeneous data formats brokered by GEOSS and (ii) expose the data formats well supported by GEOSS users.

Extensible architecture of brokering to support new access service interfaces and data formats.

Transformations facilitating reuse.

Velocity

Discovery challenges To manage the increasing rate at which metadata flows Fast metadata processing to satisfy users' needs

Operational data store that periodically extracts, integrates, and reorganizes brokered metadata records for operational inquire and ranking generation.

Caches that provide instant access to the results of distributed queries while buffering data provider systems from additional load and performance degradation.

Design of the DAB architecture that balances metadata latencies with GEOSS users' requirements, avoiding assuming that all data must be near real time.

Incremental harvesting strategy.

Live query distribution combined with caching of results.

Load balancing to route incoming requests to machines with the lowest workload.

Use of auto-scaling clusters to increase computing capacity in response of rapid workload growth.

(Continued)

GEOSS Information Systems and GCI Strategies and Solutions to Address Big Data Challenges

TABLE 14.1 (Continued)

Big Data Challenges

Solutions Adopted to Address the Challenges

Access challenges To manage the increasing rate at which data flows Fast data processing to satisfy users' needs

Operational data store that periodically generates and stores preview tiled maps of brokered data for operational data preview.

Caches that provide instant access to the results of previous access requests.

Supplementing missing transformations allows limiting the local processing time.

For extremely large processing requests, users are allowed to opt for an asynchronous version of the access functionality.

Veracity, value, and validity

Challenges

Reduction of the "information noise"

Retrieved data comparison Data trustiness for GEOSS decision makers Effective data reuse Data meaningfulness for user requests

Data accuracy for intended use

The brokering data model includes a specific multidisciplinary quality extension.

Implementation of a flexible ranking metrics including quality of service and metadata completeness as valuable indexes.

The brokering metadata model supports a harmonized presentation of retrieved metadata facilitating their comparison.

Use of GEOSS EVs as an additional parameter for improving the existing ranking metrics.

The prototyped "fit-for-purpose" and users' feedback extensions aim to provide users with quality-aware results.

Visualization

Challenges Visualization speed Contextualized visualization

Support community portals and applications publishing DAB APIs for client development.

Support the following visualization strategy: (1) provide an overview (trying to keep that simple and show important elements), (2) allow zoom and filter unnecessary clutter, and (3) provide more details if requested by users.

Provide fast previews by generating preview tiles in batch.

Source: De Laurentis, D., Understanding transportation as a system of systems problem, in System of Systems Engineering: Innovations for the 21st Century, 2009, pp. 520-541.

 
Source
< Prev   CONTENTS   Source   Next >