Introduction to Content-Based Image Classification
Prelude
A picture collage contains an entire life span in a single frame. We have witnessed global excitement with pictorial expression exchanges compared to textual interaction. Multiple manifestation of social networking have innumerable image uploads every moment for information exchanges, status updates, business purposes and much more [1]. The cell phone industry was revolutionized with the advent of camera phones. These gadgets are capturing high-end photographs in no time and sharing the same for commercial and noncommercial usage [2]. Significant medical advancements have been achieved by Computer-Aided Diagnosis (CAD) of medical images [3]. Therefore, image data has become inevitable in all courses of modern civilization, including media, entertainment, tourism, sports, military services, geographical information systems, medical imaging and so on.
Contemporary advancement of computer vision has come a long way since its inception in 1960 [4,5]. Preliminary attempts were made for office automation tasks pertaining to approaches for pattern recognition systems with character matching. Research work by Roberts has envisaged the prerequisite of harmonizing two-dimensional features extracted from images to three-dimensional object representations [6]. Escalating complexities related to unevenly illuminated pictures, sensor noise, time, cost, etc. have raised realistic concerns for continuing the ensuing research work in the said domain with steadfastness and uniformity.
Radical advancements in imaging technology have flooded the masses with pictures and videos of every possible detail in their daily lives. Thus, the creation of gigantic image datasets becomes inevitable to store and archive all these rich information sources in the form of images. Researchers are facing mounting real-time challenges to store, archive, maintain, extract and access information out of this data [7].
Content-based image classification is identified as a noteworthy technique to handle these adversities. It has been considered effective to identify image data based on its content instead of superficial annotation.
Image annotation is carried out by labeling the image content with text keywords. It requires considerable human intervention to manually perform this action. Moreover, the probability of erroneous annotation is high in cases of labeling gigantic image datasets with a manual text entry procedure. The text tag describing image content is as good as the vocabulary of the person who tags it. Thus, the same image can have different descriptions based on the vocabulary of the annotation agent responsible for it, which in turn hampers the consistency of the entire process [8].
Conversely, extraction of a feature vector from the intrinsic pixels of the image data has eradicated the challenges faced due to manual annotation and has automated the process of identification with minimal human intervention [9]. Present day civilization follows the trend of capturing images of events and objects of unknown genre. The process of content-based image identification can readily classify the captured images into known categories with the help of preexisting training knowledge. This, in turn, assists in decision-making for further processing of the image data in terms of assorted commercial usage.
Promptness is not the only decisive factor for efficient image classification based on content. Accuracy of classification results contribute immensely to the success factor of a classification infrastructure. Thus, to ensure the competence of content-based image classification, one has to identify an effectual feature extraction technique. The extracted features become pivotal to govern the success rate of categorizing the image data into corresponding labels.
Therefore, different feature extraction techniques are discussed in this work to represent the image globally and locally by means of extracted features. The local approach functions on segmented image portions for feature extraction, contrary to the global approach. However, image data comprises a rich feature set, which is seldom addressed by a single-feature extraction technique. As a result, fusion of features has been explored to evaluate the classification results for improved accuracy.
The experiments are carried out on four widely used public datasets using four different classifiers to assess the robustness of extracted features. Diverse metrics, such as Precision, Recall, Misclassification Rate (MR) and FI Score, are used to compare the classification results. A brief explanation of each of the metrics used is given in the following section. It is followed by the description of the classifiers and the datasets.