# Minimum Sampling Density Estimator

This section provides an outline of the experiments which were carried out in order to evaluate the formula for the minimum sampling density as it was deduced in Section 5.3.2. A simulated random sampling is therefore carried out on synthetic fields. By varying the number of samples around the deduced optimum, its validity is inspected by comparing the RMSE of each interpolated model.

## Experimental Setup

To check the validity of the approximation of the necessary minimum sampling density, the method of kriging is applied to sets of observations, varying in number, and performed on different synthetic continuous fields. Within one set, the observations are randomly and uniformly dispersed over the n-dimensional region of interest.

The differences (RMSE) between the synthetic reference field and the one derived from the interpolation are compared. The experiment is carried out on different kinds of random fields, which is specified before the respective results will be presented.

## Results

As first reference, a two-dimensional field is generated by

The resulting raster grid of 150.vl50 pixels in greyscale levels is depicted in Figure 7.1.

**Figure 7.1: **Two-dimensional sine signal as raster grid.

With Equations 5.3, 5.8 and the extent of 1Я in each spatial dimension we

get

as an approximate minimum number of samples necessary to capture the pattern for Kriging. We take seven sampling sets from 25 up to 115 observations, increasing by 15 observations with each step, normalizing it to the calculated value of 64 and plotting this quotient against the RMSE between the reference and the derived model. For convenience, this value is normalized to the highest one in the series. The parameter *range* is also added to the diagrams. It is derived from the variogram fitting procedure (see Section 5.3.5) and is normalized to the theoretical value as determined by Equation 5.3.

**Figure 7.2: **Sampling variations applied to a two-dimensional sine signal with the ratio of sampling normed to the derived value on the abscissa, and the ratio of RMSE and *range *normalized to the initial value (RMSE) and to the value of the generated field *(range s***).**

As can be seen from the RMSE graph of Figure 7.2, a noticeable degree of saturation is achieved when the quotient approaches the value of one, which represents the minimum number of samples of 64 as computed by Equation 5.8.

Extending the sine signal by a third dimension reveals a similar pattern, as can be seen in Figure 7.3. In this case, the number of samples normalized in each epoch is,

**Figure 7.3: **Sampling variations applied to a three-dimensional sine signal with the ratio of sampling normed to the derived value on the abscissa, and the ratio of RMSE and *range* normed to the initial value (RMSE) and to the values of the generated reference field *(range s. range I).*

Having used the separable variogram model for interpolation, the parameter *range* is separately estimated for the temporal dimension. Other models might also be applied here (see Equations 3.8, 3.9, 3.10, p. 43), but this is out of the scope of this evaluation. For the spatial dimension we assume this parameter to be equal for each direction in each experiment; otherwise anisotropy would have to be introduced [129].

The sampling of sine signals was primarily earned out for the reason of the transfer of concept of the Nyquist-Shannon theorem from signal processing to geostatistics (see Section 2.3.2). After the validity for periodic signals was shown, it was applied to continuous random fields as depicted in Figure 7.4.

Figure 7.4: Two-dimensional synthetic random field generated by a Gaussian covariance function.

Given an extent of 150 and a range of 30, generated by a Gaussian covariance function (see Section 5.3.1), the number of necessary observations is calculated by

In the diagram (Figure 7.5), the effect of a saturated error quotient can again be found near the abscissa value of 1.0 that corresponds with the estimated minimum sample size.

**Figure 7.5: **Sampling variations applied to a two-dimensional random field with the ratio of sampling normed to the derived value on the abscissa, and the ratio of RMSE and *range *normed to the initial value (RMSE) and to the value of the generator *(range s*).

In this case, the similarity between the RMSE curve and the range *s* ratio curve is striking, indicating that the accuracy of the estimation of the parameter *range* corresponds with the accuracy of the whole derived model.

This effect is less obvious in Figure 7.6, which represents sampling epochs performed on a three-dimensional random field. There is also a generally higher ratio between estimated and actual range parameter here indicating an increased uncertainty of estimation due to the higher complexity of the phenomenon. The saturation effect of the RMSE when the sample size approaches the number estimated by Equation 5.8 can nevertheless also be identified quite clearly.

**Figure 7.6: **Sampling variations applied to a three-dimensional random field with the ratio of sampling normed to the derived value on the abscissa, and the ratios of RMSE and *range s, range t* normed to the initial value (RMSE) and to the values of the generator.

## Conclusions

The experiments have corroborated the overall validity of the formula for minimum sampling density. It can thus be used to estimate the observational effort for any setting where the central geostatistical parameter *range* is known for all involved dimensions. It assumes a uniform random distribution of sample positions and can therefore only provide an approximate estimation. For the simulated monitoring scenarios it is of great value since it relieves the sampling process of arbitrariness and makes experiments on models of differing dynamism (and therefore differing values of *range)* comparable by norming the determination of necessary samples. We will make use of this formula in the subsequent experiments to reduce effects resulting from insufficient sampling or oversampling.