# Measure of Uncertainty: Entropy

The accuracy of classif cation is generally measured by an error matrix. As mentioned in the previous section for assessment of soft classif cation, reference data should be in soft form with f ner resolution. Many times this is not possible due to the non-availability of higher resolution image. Further, it is also not possible to generate fraction reference output from ground with a large number of samples. In such cases, entropy is used as an absolute measure of uncertainty (Dehghan and Ghassemian, 2006). Entropy, which is based on the information theory (Shannon 1948; Foody, 1996), can be used to estimate the uncertainty in the classif cation. It expresses the distribution and extent of uncertainty in a single number in information theory. Entropy of a random variable is related to the minimum attainable error probability (Feder and Merhav, 1994). Unlike the membership vector, this criterion is able to summarize the classif cation uncertainty in a single number per pixel, per class, or per image (Goodchild, 1995). It shows the strength of class membership assigned to particular class in the classif cation output.

Different forms of FERM are used to evaluate the performance of the classif er in terms of its correctness whereas RMSE and correlation coeff cient are the uncertainty measures. But these methods are def ned based on the difference between the expected and actual results and are relative measures. Thus, they are sensitive to error variations and not to the uncertainty variations. On the other hand, entropy calculates uncertainty from the classif ed data, from testing samples, without using any external data, and hence it is an indirect method to measure accuracy. Thus, entropy is an absolute measure of uncertainty, calculated only from the soft classif ed data without requiring any other external information. The entropy method has been used for validating the cluster formed during unsupervised clustering using FCM and IPCM (Yang and Wu, 2006).

In some of the classif ers, where membership values do not follow the probabilistic constraint, like PCM and MPCM, the entropy theorem can be utilized by rescaling (Ricotta, 2004). Thus, the average entropy (based on Shanon’s entropy theorem) of the complete image can also be calculated (Dehghan and Ghassemian, 2006; Ricotta and Avena, 2006).

For a better classifed output, the entropy for a known class having less uncertainty will be low, and for an unknown class with high uncertainty, it will be high in a fraction image. For example, if while taking a fraction image of a crop, the entropy value at the crop is low, the entropy value other than at the crop location will be high. Thus, low entropy means low uncertainty, which implies more accurate classifed output and vice versa. A low degree of entropy (or uncertainty) means membership is associated entirely with one class and vice versa. The entropy of a classifed fraction output can be computed using Equation (7.22) (Foody, 1995; Dehghan and Ghassemian, 2006):

where log_{2}(/%) = 0 for *ц _{ш}* = 0,

*c*denotes the number of classes, and is the estimated membership function of class

*i*for pixel

*k.*

For high uncertainty, i.e., low accuracy the value of entropy from Equation (7.22) is high and inverse. Entropy is def ned based on actual output of classif er, so it can give the pure uncertainty of classif cation results (Dehghan and Ghassemian, 2006).

# Correlation Coefficient

Correlation coeff cient is used to measure the linear association between the two variables, say, X and Y. Among all the available correlation coeff cients, Pearson- moment correlation coeff cient is best known (DeCoursey, 2003; Kassaye, 2006). The two variables from which the correlation is to be determined from the fraction images are membership values of the classif ed image and membership values of the reference image. It is given by Equation (7.23).

where *Cov(R,C)* represents the covariance between the reference *(R)* and classife d (C) data. *cr ^{K}* and cr

^{r}are the standard deviations of

*R*and C, respectively. The range of

*r*is from -1 to +1. If the variables

*(R*and C) are in perfect straight line, then, r = +1 implies increasing linear association and

*r*= -1 is decreasing linear association.

*r =*0, which is a special case, shows no correlation between the variables. A value from 0.5 to 1 states a strong correlation between two variables (DeCoursey, 2003).

# Root Mean Square Error

Root mean square error is given by taking a square root of the sum of squared difference between the membership values of the classif ed image and the reference image, as in Equation (7.24).

where Q is the membership values from the classif ed image, *R _{u}* is the membership values from the reference image, and

*MxN*is the size of the image.

RMSE gives the measure of both systematic and random errors (Smith, 1997). It is an average measure of the difference of membership values of classif ed image to the membership values of the reference data set. The RMSE values are always greater than or equal to zero, as is evident from Equation (7.24). The interpretation of RMSE is that for good results, its value should be minimum or tend toward zero. For the given data, RMSE is calculated in two ways: global and per class. Global RMSE is the RMSE of the complete image, i.e., all the fraction images, and is given by Equation (7.24). RMSE per class can be computed using Equation (7.25).