# Clustering Algorithms

In scientif c data analysis, there are many situations where it is essential, however diff cult, to effectively group similar data, due to scarcity of information about the data. Clustering is a technique for grouping similar data points based on spectral, textural signature details in a cluster rather than their equivalent points in other clusters. It is an unsupervised classif cation technique (Soman et al., 2006). The potential areas of application of clustering techniques include artifcial intelligence, data mining, digital image processing, pattern recognition, and statistics. There is not yet any single clustering algorithm available that can resolve all the clustering problems and perform best for all the datasets. Broadly, there are two types of clustering algorithms: hierarchical and partitional. The basic difference between these two algorithms is that the hierarchical clustering makes a dendrogram structure whereas partitional clustering divides data into a specif ed number of clusters (Soman et al., 2006).

Another way of classif cation of clustering algorithms are “hard” and “soft” clustering (Dave, 1991). In the case of hard clustering, each data point (or pixel in image) is allocated to exactly one class, whereas in soft clustering, there will be a fractional allotment for each class for a given data point (Babu and Murty, 1994). Some of the clustering techniques can be generally time-consuming processes; therefore, an ambiguous cluster center or initial seed value is selected for an optimal solution. Many times, clustering algorithms are represented as an optimization problem. In such cases, for hard clustering, the parameter to be optimized is only for the cluster center whereas for soft clustering it includes both the cluster center and membership value for each class. The two basic and popular soft of fuzzy clustering algorithms are fuzzy c-means clustering (FCM) and possibilistic c-means (PCM). In FCM, the membership value signif es the degree to which a class is shared to a cluster, while in PCM it refers to the belongingness degree (Krishnapuram and Keller, 1993). In comparison to FCM, PCM clustering has been found to be more stable for noise. The outliers or noisy points (or data) are always problematic for effective clustering. These noisy points affect the output of the clustering algorithm, and hence, the outcome is nonrealistic clusters. Further, FCM clustering algorithm makes a certain relation between each data point and a cluster. In other words, each data point is convincingly assigned to a given cluster, whether the point is noisy or not (Dave, 1991). In PCM clustering, according to Krishnapuram and Keller (1993), the problem of noisy points (or outliers) has been resolved somehow by assuming the degree of belongingness of each data point to be equivalent to the membership value. However, the proper handling of noisy points was f rst proposed by Ohashi (1984) and Dave and Krishnapuram (1997).

## Fuzzy c -Means (FCM) Classifier

The FCM method was initially introduced by Dunn (1973) and later developed by Bezdek in 1981. It is one of the prevailing clustering algorithm methods for the fuzzy classif cations. This method can be applied for partitioning a pixel into different membership values corresponding to the classes present in the digital image.

Each pixel in the image is related to every class by a function known as membership function. The computed values of the membership function are simply known as membership values and vary between zero and one. If the membership value is close to one, then it implies that pixel is strong representative of that class, while membership value close to zero implies that the pixel has weak or no similarity with the information class (Bezdek et al., 1984). Thus, the net impact of such a function is to make fuzzy c-partition of a given data (or satellite image in case of remote sensing). The summation of the all the membership values for each pixel must be equal to unity (Bezdek, 1981). This can be achieved by minimizing the objective function of FCM.

The objective function of the FCM classif er (known as the least square objective function) is given by Equation (3.1) and the distance square is mentioned in Equation (3.2) with constraints in Equation (3.3):

where constraints imposed in Equation (3.1) are

where in Equation (3.1)

*U = Nxc* matrix,

*V =* (vi... v_{c}) is the collection set of vector of information class centers v„

ju*, is a class membership value of a pixel,

*d _{ki}* is the distance in feature space between

*x*and v„

_{k}*D (x*_{k}, v,) is the square of *d _{ki},*

*x _{k}* is a vector (or feature vector) denoting spectral response of a pixel

*k,*v, is a vector (or prototype vector) denoting the information class center of class

*c* and *N* are total number of information classes and pixels respectively,

*A* is the weight matrix, and

*m* is the weighting exponent (or fuzzif er) such that 1 < *m <* cc. When w -» 1, the membership function is hard, and when *m* -> с», the memberships are maximal fuzzy (Krishnapuram and Keller, 1993).

The weight matrix *A* controls the shape of the optimal information class (Bezdek et al., 1984). Generally, it takes the following norm as mentioned in Equations (3.6-3.8):

where,

*I* is the identity matrix,

Д is the diagonal matrix with diagonal elements as eigen values of covariance matrix, and

*Q* is given by Equations (3.9) and (3.10):
where,

After solving the objective function (Equation 3.1), the membership value can be computed as mentioned in Equation (3.11):

where *Ji _{ki}* represents the realization of the class membership value /%.

From Equation (3.1), the center of the information class as fuzzy mean v, can be computed as mentioned in Equation (3.12):

where v,- represents the realization of the information class center value v,.