Convolutional Neural Network (CNN)
In various applications such as self-driven cars, smart web searches, and pattern of speech and image recognition the application of machine learning has grown many fold in last decades (Bhandare et ah, 2016). Machine learning applications are becoming part of daily life (LeCun et ah, 2015). In machine learning, a branch of artif cial intelligence computers teaches itself with being provided reference or training data. Machine learning could drive future technology, being a very interesting and complex topic (Krizhevsky et ah, 2012; Long et ah, 2015; Dong et ah, 2016).
The machine learning algorithm follows the principle of biological neural networks the same as biological neurons are organized. Machine learning gives an opportunity to mimic the processes arising in the brain (Zeng et ah, 2015). One of the examples of machine learning is a neural network; the neural network consisting of individual units called neurons. Neurons are located in all the layers of a neural network. Neurons in each layer are connected to neurons of the next layer, and data f ows from the input layer to the hidden layer to the output layer. Output from each neuron is generated using a weighted sum into an activation function present in each neuron. Output from each neuron is transmitted to all the neurons of the next layer connected to it.
The beneft of neural networks came in connection with high end processing machines and large training data. With these two points, modif ed versions of neural networks can be seen used in last decade, including convolutional neural network (CNN), recurrent neural network (RNN), etc., added on with deep learning concepts. In these networks, technical pre-processing stage structures of neural networks are different than ANN. But these modif ed networks are able to solve a wide range of tasks which were not effectively solved in the past. From various applications of these networks, image classif cation can be an important example.
Convolutional Neural Networks Image Classification
A convolutional neural network (CNN) has different pre-processing layers in comparison to MLP neural networks as shown in Figure 4.4. Basically, CNN uses features called visual cortex (Hubei and Wiesel, 1968; Fukushima, 1980). The visual cortex in the brain is the main cortical region. Its main work is to receive, integrate, and process visual information, which is received from the retinas. Due to mimicking like brain, the most popular use of CNN architecture is image pattern recognition. Some of the applications of CNN in social media are automatic tagging in Facebook photographs, in Amazon for generating product recommendations, and in Google for searching through photos.
In remote sensing for image data, the main use of CNN is for earth object recognition. The main objective of object recognition is identifying similar patterns in one group and providing labels. The pattern recognition is a skill learned by people from birth, and they are easily able to determine various objects. But the way a computer identif es the objects from an image is quite different. The computer considers an image as an array of pixels. For example, an image can be of size 300 x 300 with RGB bands. In this example, the array size will be of 300 x 300 x 3,
FIGURE 4.4 CNN - Multi-layer CNN architecture.
where 300 are rows, the next 300 columns, and 3 is RGB channel values. In the image range of pixels, value depends on data types of the image, which gives bits of the image. For 8 bits, data images have values from 0 to 255 to each of these numbers. These values describe as vector element the intensity of the pixel at each row and column. To classify these pixels, a unique property is generated from a classif cation algorithm. In human learning, these characteristics can be, for example, specif c characteristics of an object. For the computer, these characteristics are different shapes of an object. In the case of convolutional neural networks, convolutional layers construct more abstract concepts. The steps followed in CNN on a given input data are a series of convolutional operations, non-linear operations, pooling layers operations, and last, fully connected layer steps are applied to get the output.
The Convolution layer operation is always the f rst in which an image is input to it. Reading of an image is done from the top left corner of an image, as this takes the least time in reading an image. The next step is to select a small matrix, called a f Iter. This f Iter generates convolutions output, while moving along the input image. This f Iter job is to multiply its values with the original pixel values of the image in that region. Multiplying flter coeffcients with image values weighted sum is calculated, and f nally a single number is generated from this operation. Initially the f Iter has used pixel values from the upper left corner only, and it moves further and further right by 1 or higher unit, performing a similar operation. This unit movement is called stride, and stride can be of any unit. After applying the flter across the image, an output in the form of a matrix is obtained, which is smaller than an input matrix. This operation identifes boundaries and simple colors from input image with respect to human perception. In order to recognize very specif c properties of features, a higher level of CNN network is required. The CNN network consists of several convolutional operations, pooling operations, and non-linear layers. In CNN, the f rst layer’s output becomes input to the next layer, even though the next layer may not be the same as the f rst one, and this happens to consecutive layers until it reaches the f nal layer.
In CNN activation, a function is present as non-linear layer, after each convolution operation to be applied (Glorot et al., 2011; Krizhevsky et al., 2012; Nair and Hinton, 2010; LeCun et al., 2015; Ramachandran et al., 2017). This function produces a non-linear decision boundary via non-linear combinations of the weight and inputs. Without the non-linear decision boundary, the network would not be able to model the response variable. After the non-linear layer, the pooling layer comes in CNN (Lin et al., 2013). Pooling operation reduces the image size through down sampling operation. In down sampling, the image is compressed in which its details were reduced as less detailed pictures. This down sampling operation was ended as various features have already been identif ed in previous convolution operation, and an image with detailed information is not needed.
In CNN, after various steps of convolutional operations, non-linear operations and pooling layers are applied, then the last compulsory operation layer, called fully connected layer, has to apply. Fully connected layer gives the f nal output information from convolutional networks. The end layer in CNN, called the fully connected layer, provides N dimensional vector; N depicts the number of classes to identify.