Partitioning Large Models: Performance Considerations

Apart from the continuous update mechanism as assumed above, the proposed method can also be used to partition large models and apply it in a divide-and- conquer manner [23].

Kriging goes along with a high computational complexity—caused by the inversion of the covariance matrix—of 0(n3) [116, p. 356], [96, 10], where n is the number of samples. Considering this fact in the context of a massive data load in combination with (near) real-time requirements, this can become a severe limitation of the method. Hence, when sticking to its essential advantages like the kriging variance, the merging strategy can be applied to mitigate the computational burden while delivering comparable results.

The original set of observations is separated into s subsets to which the kriging method is applied separately. The resulting sub-model grids are in the same area as the master model that contains all points. To consider all measurements in the final model, the sub-models are sequentially merged with their respective predecessors, as shown in Figure 5.16.

Alternatively, all sub-models might be calculated before they are merged in one step using Equation 5.17. This approach would, however, not provide the advantage of an immediate—albeit coarse—result. Since the linear combination of values is not equivalent to the subsequent variant, the resulting model will

Sequential calculation schema

Figure 5.16: Sequential calculation schema: model partitions calculated separately and merged sequentially; the loss of accuracy induced by this approximation is indicated by the RMSE.

also differ. With the applicability for continuous updating of real-time systems in mind, only the sequential approach was investigated further here.

As is the case for any approximate solution, there is a trade-off between performance gain and resulting accuracy. As for other cases in this work, the loss of accuracy is quantified by the Root Mean Square Error (RMSE) against the master model.

In a spatio-temporal context, the segmentation should be performed with respect to the order of timestamps, thus representing temporal intervals per submodel. This also applies to real-time environments where subsequent models are to be created continuously.

For a pure spatial model, the subsets of points can be generated randomly. Here, the order of sub-models does not represent the temporal dynamism of the phenomenon, but rather a utilisation level of information with associated estimated accuracy. This is also the case for the configuration as introduced below.

The segmentation and associated sequential calculation limits the potential complexity of 0(n}) to the size of each subset s. This can be set as a constant, but could also be dynamically adaptive to the data rate. In any case, there should be an upper bound for the size of sub-models to limit the computing complexity.

While doing so, the merging procedure itself can be costly, but grows only linearly with n and can also be parallelized easily. Thus, it is not substantially critical for massive data.

The theoretical computational complexity of this approach is compared to the one of the master model calculation in Figure 5.17. As can be seen from the formula given in the lower part of the figure, the reduction of complexity is achieved by removing n from cubed terms (except n mod s, which is not critical).

Theoretical computational complexity of master model calculation (dashed line) vs. the sequential calculation method (solid line); n = all samples, s = size of sub-model, c = merging effort

Figure 5.17: Theoretical computational complexity of master model calculation (dashed line) vs. the sequential calculation method (solid line); n = all samples, s = size of sub-model, c = merging effort.

Assuming this merging procedure, spatially isolated or temporally outdated observations can keep their influence over multiple merging steps, depending on the decay function (Equation 5.19). This is especially helpful when no better observations are available to overwrite them. Nevertheless, by using the kriging variance, the growing uncertainty of such an estimation can be expressed, which can then be considered where it appears relevant for monitoring and analysis.

Apart from some loss of accuracy, the strategy of sequencing comes along with several advantages. So it can be used to calculate large datasets with less computational effort. This can be carried out while, in principle, the advantages of kriging like the unbiased and smooth interpolation of minimum variance and the estimation of uncertainty at each position, are retained.

Given a continuous sensor data stream, this approach can integrate new measurements seamlessly into the previous model at flexible update rates. An experimental evaluation of this concept will be presented in Section 7.3.

< Prev   CONTENTS   Source   Next >