SAMPLE ACQUISITION AND ROI EXTRACTION

Sample Acquisition

A contact-free and peg-free image acquisition setup, as shown in Figure 9.4, has been used in the current work. The imaging setup uses (i) low-cost camera, (ii) wooden box with an aperture at the top to make room for the camera lens, and (iii) fluorescent light fixed below' the roof of the box to provide illumination. It is to be noted that the setup

Image Acquisition Setup used for Constructing Hand Dorsal Image Database

FIGURE 9.4 Image Acquisition Setup used for Constructing Hand Dorsal Image Database.

does not use any peg. For image acquisition, a volunteer needs to place his/her hand in the box without any constraints on the position or orientation of the palm. Thus, the sample acquisition setup can be asserted to be a contactless, unconstrained one.

For this work, a database of 890 images had been formed by collecting five hand dorsal images each from 178 volunteers. Each of the five images was acquired at a time-interval of seven days. Age of these volunteers span between 17 and 65 years. The volunteers included males and females. There were no restrictions imposed on them regarding the use of nail paint or finger-rings.

ROI Extraction

In order to obtain reliable ROIs of the nail plates, the images were subjected to a sequence of pre-processing steps, finger normalisation procedure, and ROI extraction techniques. The series of aforesaid steps were followed as is described in Ref. [5], and these steps have been portrayed in the block diagram in Figure 9.3. A sample of the initially captured hand image, and the decomposed and extracted fingers are shown in Figure 9.5 [5]. Figure 9.6 [5] shows the extracted ROIs of the fingernail plate of index, and the middle and ring fingers of the hand sample shown in Figure 9.5.

Decomposed and Normalised Fingers from Sample Image

FIGURE 9.5 Decomposed and Normalised Fingers from Sample Image.

ROIs of Fingernail Plates of Index, Middle, and Ring Fingers of the Sample Image in Figure 9.5

FIGURE 9.6 ROIs of Fingernail Plates of Index, Middle, and Ring Fingers of the Sample Image in Figure 9.5.

FEATURE EXTRACTION

Deep learning, a distinctive and promising headliner from the domain of machine learning, has validated itself to be immensely advantageous for the purview of computer vision [18,19]. This is owed highly to the fact that this subset of Machine Learning has empowered computers to execute intricate and elaborate perception tasks like image classification, object detection, etc. [20] very effectively.

A colossal volume of labelled data is explored by the deep learning networks to learn about which specific features differentiate the various groups of data, and to consequently form a framework for feature extraction and classification [21]. One of the most phenomenal and favourable characteristics of the pre-trained deep learning models is that such models can be fine-tuned to serve purposes for which they were not trained in the first place. Such a method of fine-tuning a pretrained model to use the knowledge earned and stored while solving one problem, and applying the same to another problem is called Transfer Learning [22]. For such cases, the fine-tuned model itself is capable of serving as the feature extractor. Many such models have carried out different computer vision tasks [23,24] for which the original model was not trained. To address such problems, the last layer of the pretrained model is substituted with a classifier that agrees with the space dimensions of the newly assigned task. If the fine-tuning is carried out well, these models perform efficiently when applied to a new sphere of task.

The current work makes use of three different pre-trained deep learning models, viz. - AlexNet, ResNet-18, and DenseNet-201.

Transfer Learning using AlexNet

AlexNet [25] has been originally trained on a subset of the ImageNet database [26], which originally contained more than 15 million annotated images segregated into more than 22,000 categories. AlexNet consists of eight weighted layers; specifically five convolutional layers followed by three fully connected layers (fc6, fc7, fc8). The weighted layers are followed by one or more layers like Rectified Linear Units (ReLU) activation function, maxpooling function, Local Response Normalisation (LRN) function, etc. The output of the/cS layer is provided to a softmax layer. This capacitates the network to predict what probability the test subject has, of belonging to the different trained classes. Due to the reasonably smaller size of the current database, building a new deep learning network would prove to be unproductive. Thus, Transfer Learning has been opted for.

AlexNet has been suitably fine-tuned, and the newly modified network has been named as Transfer Learning using AlexNet (TLA). All layers of AlexNet, except the last one (namely, fc8), have been retained for TLA. A new fully connected (FC) layer and a softmax layer are added to the retained set of layers. The new FC layer has been taken to be of size equal to the number of classes (users) in the concerned database. In case of the fingernail plates’ database used in this work, the number of classes is 178. Transfer learning requires slow learning over the layers that are retained and fast learning over the new layers. In order to warrant fast training over the newly added layers, the bias and weight learning rates are multiplied with a high value of 20 in the new FC layer. Authors have zeroed down on this value of ‘20’ through empirical computation and found it to deliver optimum results. To ensure that the learning process is slow over the retained layers, the initial learning rate has been kept low (0.0001). The newly formed TLA network has been trained over the images of the fingernail plate database. Three TLA feature-sets pertaining to the index, middle, and ring fingernail plates have been extracted from the fully connected ‘/c7’ layer of the trained TLA network.

Transfer Learning using ResNet-18

Since the advent of the AlexNet [25], a number of deeper Convolutional Neural Networks have been introduced. However, it was seen that increasing the network depth just by stacking layers often saturates or degrades performance accuracy. This was because of the vanishing gradient problem. Residual Neural Networks, or ResNets [27], demonstrated that the vanishing gradient problem can be tackled by splitting a very deep network into smaller blocks, which were inter-connected through skip connections.

ResNet-18 is a network trained on a section of the ImageNet database. This network is composed of five convolutional layers, which are superseded by an average pooling layer and a FC layer. The weighted layers are followed by layers like ReLU function, maxpooling function, etc.

Transfer Learning using ResNet-18 (TLR) has been used as one of the feature extraction techniques in this work. Except for the last FC layer, all layers of ResNet-18 are retained for TLR. A fresh FC layer is added, which has a size equal to the number of classes (178) of the current database. Transfer learning demands slow learning over the detained layers and fast learning over the fresh layers. With a view to ensure faster training over the new layer, the learning rate factors of the newly added layer are set at 20. Feature-sets of the index, middle and ring fingernail plates have been extracted from the average pooling layer of the modified deep learning model.

Transfer Learning using DenseNet-201

Densely Connected Convolutional Networks, or DenseNets, are popular as a logical extension of the ResNets. DenseNets concatenate outputs from previous layers, whereas ResNets sum them up. Major advantages of the DenseNets are that they diminish the vanishing gradient problem stated in Section 9.4.2, reinforce propagation of feature, promote reuse of feature, and reduce the number of parameters noticeably.

The DenseNet-201 [28] is formed of five dense blocks, w'here each dense block consists of a lxl convolutional layer for downsampling, followed by a 3 x 3 convolutional layer. The dense blocks are followed by an average pooling layer and a FC layer. There is one transition layer between every two dense blocks, which is made up of a 1 x 1 convolutional layer followed by a 2 x 2 average pooling layer. Every dense block follows the sequence: Batch Normalisation-ReLU-Convolution.

Transfer Learning using DenseNet-201 (TLD) is the third feature extraction technique used in this work. All layers of the DenseNet-201 have been retained,

TABLE 9.1

Hyperparameters used in the Implemented Deep Learning Models

Model

Hyperparameters

Momentum

Initial Learning Rate

Mini-Batch Size

TLA

0.9

0.0001

5

TLR

0.9

0.0003

10

TLD

0.9

0.0003

10

except the final FC layer. This FC layer has been replaced in the TLD with a new FC layer which has 178 number of outputs in tune with the current database. Similar to TLR, in TLD too, the learning rate factors of the new FC layer have been fixed to 20. The required feature-sets from the TLD model have been extracted from its average pooling layer.

Table 9.1 enlists the important hyperparameter settings of the above mentioned models.

 
Source
< Prev   CONTENTS   Source   Next >