Different parameters have been used to measure the classification performances of the diverse feature extraction techniques . A brief explanation for each of the parameters is also provided.
Precision is defined as the fraction of appropriate classification among the classified instances as in equation 1.1
1.2.2. True Positive (TP) Rate/Recall
Recall is defined as the fraction of appropriate classification among the total number of related instances as in equation 1.2.
1.2.3. Misclassification Rate (MR)
Misclassification Rate (MR) is defined as fraction of incorrectly classified instances and is denoted as the error rate of the classifier as in equation 1.3.
The weighted average of Precision and Recall (TP Rate) is deduced as the FI Score of classification and is given as in equation 1.4.
Accuracy of a classifier is measured by means of the recognition rate of the classifier for identifying the instances correctly and is given as in equation 1.5.
1.2.6. False Positive (FP) Rate
This is the fraction of incorrect results classified as positive results by the classifier and is given as in equation 1.6.
1.2.7. True Negative (TN) Rate
This metric is the fraction of negative results correctly produced for negative instances and is given as in equation 1.7.
1.2.8. False Negative (FN) Rate
This metric provides the fraction of positive results that are classified as negative results by the classifier and is given as in equation 1.8.
Diverse feature extraction techniques have been implemented to assess the classification performance under varied classifier environments. This is done to examine the consistency of the corresponding feature extraction technique when applied to different datasets. Four different types of classifiers have been considered to carry out the task. The classifiers are К Nearest Neighbor (KNN) Classifier, Ripple-Down Rule (RIDOR) Classifier, Artificial Neural Network (ANN) Classifier and Support Vector Machine (SVM) Classifier. Brief descriptions of the classifiers explain their individual work flows .
The highest error rate for К Nearest Neighbor (KNN) is twice the Bayes error rate, which is considered as the minimum possible rate for an instance-based classifier. Thus, it is considered as one of the classifiers for evaluation purposes. It classifies the unknown instance by identifying its nearest neighbor in the instance space and further designates the class of the nearest neighbor to the unknown instance.
The working formula of four different similarity measures, namely, Euclidean Distance, Canberra Distance, City Block Distance and Mean Squared Error, have been given in equations 1.9-1.12. These similarity measures are implemented to figure out the nearest neighbors during the classification process of the instances.
Random Forest Classifier
It is a classifier which acts on the principle of ensemble learning. It builds a number of decision trees during the time of training and the output class is determine by the mode of the classes it learns during the supervised procedure.
Random Forest Classifier has a number of advantages that have made it a popular choice for the classification task. It can proficiently manage large datasets by handling thousands of input variables. It refrains from variable deletions and approximates the significance of a variable for classification.
It efficiently estimates missing data to maintain classification accuracy. The forest generated during the classification process can be saved for future use on different datasets.
Complex pattern recognition is carried out with the Artificial Neural Network (ANN) Classifier, which is a mixture of simple processing units. Enhanced noise tolerance level and properties of self-adaptation have made ANN classifier proficient for real-time applications. A category of feed forward artificial neural network named multilayer per- ceptron (MLP) is used in this work, which has implemented back propagation for supervised classification. Classification performance is optimized by training the network using feed forward technique. A predefined error function is computed by comparing the predicted values with the correct answers. Weight adjustment of each connection is done henceforth by feeding the errors to the network to reduce the error values. The overall error is reduced in this way by executing the process repeatedly for a large number of training cycles. The predefined error function is calculated by using the back propagation function with known or desired output for each input value. An MLP framework is comprised of input nodes, output nodes and hidden nodes. The number of input nodes is determined based on the summation of the number of attributes in the feature vector and the bias node. Output nodes are equal to the number of class labels. The hidden nodes are structured based on a predefined value. Multiple layers in the classification framework have the output flow toward the output layer, as in Fig. 1.1. The MLP shown in Fig. 1.1 has two inputs and a bias input with weights 3, 2, and -6 respectively. The activation function/4 applied to value S is given by S = 3*1 + 2x2 - 6.
Unipolar step function is used to calculate the value of /4 given in equation 1.13. It has used an output of 1 to classify into one class and an output of 0 to pass in the other class.
An alternative representation of the values of 3, 2, and -6 as respective weights of three different inputs for the perceptron is shown in Fig. 1.2.
The horizontal and vertical axis is denoted by .Vi and X2 respectively. The intercepting coordinates of the straight line to the vertical and the horizontal axes are the weights. Two classes are categorized by the area of the plane on the left and the right side of the line *2 = 3- Ъ/2х.
1 Graphical Illustration for Multilayer Perceptron.
Nonlinear mapping is used by Support Vector Machine (SVM) Classifier to convert original training data to a higher dimension. Searching of Optimal separating hyperplane is carried out in this new dimension. The hyperplane is useful to separate data from two different classes using an appropriate nonlinear mapping to an adequately high dimension. The methodology is illustrated in Fig. 1.3, where support vectors are denoted with thicker borders.
Structure of Hyperplane in SVM.
The idea was to implement the feature extraction techniques with widely used public datasets namely, Wang Dataset, Caltech Dataset, Corel Dataset and Oliva-Torralba (OT-Scene) Dataset.