# Empirical Mode Decomposition

A number of sections in the chapter discuss aspects of the analysis of nonstationary and nonlinear data, and no discussion of this topic would be complete without inclusion of the empirical mode decomposition, also known as the Huang-Hilbert transform. The technique was first published in 1998 (Huang *et al.,* 1998) and has been applied widely for the analysis of nonstationary and nonlinear data. Data from machinery is, of course, often stationary being dominated by rotational speed effects, but there are instances in which full nonstationary analysis is an advantage.

The term "mode" in the name of the technique is somewhat deceptive in so far as it has nothing to do with the analytic mode shapes corresponding to eigenvectors of matrix models. The "modes" used in this technique are just shapes which prove convenient for the data decomposition.

The technique can be applied to any signal, but it is advantageous only in the case of data which is nonstationary and/or nonlinear. The approach is to resolve the complete data set into a linear combination of so-called intrinsic mode functions (IMFs). The real relevance of this is that these intrinsic mode functions all have well-defined HTs and this leads to the ability to define instantaneous energy and frequency since the data is expressed as a sum of IMFs.

An IMF can be any function which satisfies two conditions:

- a) In the whole data set, the number of extrema and the number of zero crossings must be either equal or differ by one.
- b) At any point, the mean value of the envelope defined by local maxima and that defined by local minima is zero.

The decomposition is largely intuitive and is derived through an iterative process, the steps of which are as follows:

Let the input data be *x(t). ^{[1]} ^{[2]}*

- 3) The mean of mean of these two envelopes is calculated at every point as
*m{t)*=^{f}™d0+frnin(0 - 4) If m(f)= 0 the function is an IMF; otherwise, subtract this mean function from the input and repeat the process.
- 5) This cycle is repeated until, ideally, (4) is satisfied or some tolerance is met. The first IMF, say, IMF1, is defined.
- 6) Subtract this function from the original data and repeat the whole process for
*r(t) = x(t)*- IMF1. - 7) The whole process is repeated until some error criterion is met.

For each IMF, the HT will give the instantaneous amplitude and frequency.

The real power of this approach is in its treatment of nonstationary data. It is still relatively new when compared to other techniques such as FFT but clearly has a role to play. It has been applied to some investigations on subharmonic vibration on gas blowers (Wu and Qu, 2009) and to some interesting gearbox diagnostics (Ma *et al.,* 2015).

# Kernel Density Estimation

One of the objectives of this chapter in the presentation of a range of analysis techniques which either have proved useful or, in the case of newer developments, are considered promising.

Such an approach is kernel density estimation which may be used for the estimation of probability distribution functions from measured data. Given a limited number of measurements of a variable *Xj* the problem is to estimate the probability distribution, *p(x).* Of course, if there were an infinite number of measurements this would be a trivial exercise, but in reality some subtle approach is required particularly if there is more than a single variable. The first step is fairly obvious - one simply divides the range into a series of bins. Mathematically this may be expressed as

where *n* is the number of data points and *h* is the window width, *Xj* is the sampled data and

While this gives some idea of the distribution, it is rather crude and is discontinuous. This is a particular problem where the number of samples, *n,* is small in some sense. This distribution is sometimes called the naive estimate. Basically, this has been constructed by placing a "box" of width *h* and the resulting so-called naive estimate is shown in Figure 8.7.

An approximate probability distribution can be formed by taking *n* overlapping bins of width 2*h* and height *(**2**nh) ~ ^{l}* in the appropriate position for each measurement then summing. It is implicitly assumed that each measured value it at the center of a bin.

The situation can be substantially improved simply by inserting a Gaussian curve instead of a rectangular box for each measurement point, or "atom" as they are (rather confusingly) called. The improvement stems from the fact that the Gaussian and its derivative are well behaved in the sense of being continuous and single valued in contrast to the top hat function. Hence, we can form an estimate as

where *К* can, in principle, be any well-behaved distribution but the Gaussian is normally used so that

FIGURE 8.7

Naive Estimate.

Inserting the Gaussian into Equation 8.67, the only parameter to be chosen is *h, *the so-called bandwidth. This is chosen to minimize the mean integrated square error (between the derived distribution and the actual readings). Silver- man (1986) describes how this is used to derive the optimal choice for the bandwidth and his result is

where *A* is min(standard deviation, interquartile range/1.34), this "interquartile range" being the difference between the locations of quartiles 2 and 3. In the literature there are several (minor) variations in the magnitude of this expression. In principle the precise optimum will depend on the shape of the (unknown) distribution. One can, of course, formulate an iterative procedure, but in reality, errors from this source will be fairly small. The way in which smooth curves (Gaussian being the most common) are used as components to form distribution curves is illustrated in Figure 8.8.

The choice of *h,* the bandwidth, or sometimes called the "smoothing parameter," is, of course, crucial. The effect of changing this parameter is shown in Figure 8.8. These figures show the importance of choosing the appropriate bandwidth to get insight into the underlying probability distribution.

The situation becomes slightly more complicated where there are two or more variables. In the case of two variables the estimated density function will be given by

where

*h,li**2* are the smoothing parameters for the *x* and *у* values, respectively, and /13 controls the orientation. It is assumed that

with 0

< 1. As one might anticipate the situation becomes rapidly more complicated for higher dimensions.

So, how can these techniques be used to advantage in condition monitoring? While only limited use has been made to date, it would appear that there is significant potential. Baydar *et nl.* (2001) reported the application of KDE techniques in the study of gearbox wear. It was shown that the improved representation of the probability distribution could be used to detect and incipient fault at an earlier stage than was possible with more conventional approaches.

This reasoning would also suggest that this approach can also benefit studies in remaining life prediction of machine components. The author, however, is not aware of any work to date in this area.