Deep Learning for Medical Image Processing

The availability of high-end computing facilities and the necessary software allows the implementation of Deep-Learning Architectures (DLA) for the automated detection of diseases with higher accuracy, which is supported by the literature using examples of traditional and customary DLA [1-3].

The main advantage of the DLA is it is designed to examine RGB-scale images while also working well on a class of greyscale images. The main limitation of the DLA, on the other hand, is that before executing the detection process, the images to be examined requires resizing into recommended dimensions, such as 227 x 227 x 3 or 224 x 224 x 3 [4,5]. Available literature confirms the availability of traditional DLAs trained and validated with benchmark images [6-10]. Traditional DLAs, such as AlexNet, Visual Geometry Group-16 (VGG-16), VGG-19, Residual Network (ResNet) with various layer sizes such as 18, 34, 50, 101 and 152, and other traditional or customary architectures are chiefly considered to examine medical-grade images to support the automated disease detection process. The choice of a particular DLA depends on the speed of the diagnosis, required detection accuracy, and the availability of the computing facility. In this section, well known DLA such as AlexNet, VGG-16 and VGG-19 are implemented for the experimental demonstration. The disease detection procedures are executed using MATLAB and Python.

Introduction

The various stages of disease detection using medical images are presented in Figure 6.1. This figure also illustrates various techniques used in the literature to detect the diseases in medical images using a dedicated scheme.

The low-grade learning system is an early procedure adopted to detect the disease from a group of chosen images. The accuracy of this system was poor. This method detected diseases in a two-step process: feature extraction, and classification. The limitation of this technique is that, due to the two-step process, attaining accurate disease detection is less probable compared to the more recent ML technique.

Existing Automated Disease Classification Techniques Available for Medical Images

FIGURE 6.1 Existing Automated Disease Classification Techniques Available for Medical Images.

ML systems, also known as high-grade ML systems, help get better results due to improved schemes. This system includes a feature selection process, which helps attain higher detection accuracy while reducing the problem of over fitting. Because of this, the ML system is usually implemented by researchers to classify medical images. An added advantage of this system is that it can be implemented with smaller computation devices without sacrificing accuracy [l 1-13].

Although the DLA requires higher capacity computing facilities, it produces higher accuracy disease detection and works well on a range of greyscale and RGB-scale pictures irrespective of the imaging modality used. The main advantage of the DLA is, instead a customary network, transfer learning concept can be utilized to make use of the existing DL, as construction of a customary DLA system from the scratch is a time-consuming process. The training and testing procedures required in these systems are very high. The existing DLA based on transfer learning works well on disease detection problems and, in every case, it helps to attain better results. This DLA extracts the low-, mid-, and high-level features from each image and, based on the extracted information, will learn and remember. Further, all the extracted features are sorted based on their rank and a few features are discarded before it reaches the classifier (SoftMax). Classification accuracy attained with the deep classifier can be improved by implementing traditional classifiers available in the ML systems [14,15].

Implementation of CNN for Image Assessment

A breach in construction of networks for image categorization came with the finding that a Convolutional Neural Network (CNN) can be used to extract higher- level features from the test image. As a substitute to pre-processing the information to obtain features such as textures and shapes - as in an ML scheme - a CNN directly extracts the image’s every unprocessed pixel information as input, and studies the information available in the test image to be processed. Normally, the CNN collects an input attribute map, a three-dimensional matrix in w'hich the dimension of the first two magnitudes are related to the length and width of the image in pixels, and dimensions of the third is related to three-channel color components such as red, green, and blue - as in the case of RGB images - and gray pixel information - as in the case of greyscale images. The CNN encompasses a multitude of modules, each of which executes three functions. The working theory behind the CNN is explained below:

• Convolution

This procedure extracts features and generates an output feature map with different dimensions compared to the input feature map. This can be defined with two parameters:

i. Dimension of the strips to be extracted (naturally 3 x 3 or 5 x 5 pixel map).

ii. The depth of the output feature map, which matches the amount of filters considered.

Throughout the convolution, the filters successfully glide over the input feature map’s grid horizontally and vertically, one pixel at a time, extracting corresponding information from the test image.

The CNN executes element-w'ise reproduction of the filter and tile matrices for every filter-tile brace. It then sums all the rudiments of the resulting matrix to obtain a solitary charge. Each of these resulting values for each filter-tile pair is productively convoluted into a feature matrix.

Throughout preparation, the CNN discovers the finest values for the filter matrices to facilitate its mining of significant descriptions and information such as textures, edges, and shapes of the image using the values on the input feature map. The number of features extracted by the CNN rises w'hen the number of filters used on the input increases. This increase in the filter size amplifies the training size, making the initial tuning of the CNN architecture more complex.

• ReLU

Rectified Linear Unit (ReLU)-based alteration of the convolved feature is employed in CNN to introduce non-linearity into the model. ReLU function F (r) = max (0, r), proceeds r for all values of r > 0, and assigns 0 for all values of r < 0.

• Pooling

Pooling is executed in the CNN following ReLU. In this process, the CNN downsamples the convolved feature, which further drops the amount of aspects on the feature map. This process preserves the most significant feature data and is technically called the Max Pooling (MP) process. The MP works similar in style to convolution. The search begins with a tile with predefined shape which glides over the feature map to mine other tiles with a similar size. The maximum assessment is outputted to a new feature map for every tile, discarding all other values.

The operation of the MP is discussed below:

i. Size of the max-pooling filter (normally 2x2 pixels)

ii. Stride of the coldness in pixels after extrication of every tile

• Fully Connected Layers

The last part of a CNN includes one or more Fully Connected Layers (FCL) based on the need, in which every node is connected with other individual nodes of the network. The job of the FCL is to provide the essential one-dimensional feature vector to train, test, and validate the classifier unit found at the final part of the network. This classifier detects/classifies the medical images based on the selected features. To avoid the problem of over fitting, a discard operation is implemented to limit the image feature value. In most of the CNN, the classifier adopted is a SoftMax unit that performs a two-class/binary classification operation.

Figure 6.2 depicts the typical CNN with a single layer and. to achieve superior results, a considerable number of layers needs to be placed between the input image and the classifier system (SoftMax).

Transfer Learning Concepts

Development of a CNN structure for a chosen task requires higher computational power (a computer with better RAM and graphics card) and additional time. The initial architecture construction needs more effort, and one cannot assure that a developed model is unique and will work on all image cases. To reduce the computational burden, the existing CNN (DLA) is adopted on medical images for the purpose of disease detection/classification.

Transfer learning is a machine-learning practice in which an existing and the pretrained model of a DLA is utilized to perform disease detection operations. This section presents the results attained with simple pre-trained models (using the transfer learning approach), such as AlexNet, VGG-16 and VGG-19.

AlexNet

AlexNet is one of the most successful and widely used CNN structures from 2012 onwards. It was proposed by Alex Krizhevsky [16]. AlexNet made a huge impact on the field of machine learning, particularly in the field of DL-to-machine vision. Essential information regarding the AlexNet can be found in [5-7]. This structure has eight layers: the first five include convolutional layers, followed by MP layers, and the last three indicates FCL as depicted in Figure 6.3. CNN uses the nonsaturating ReLU activation utility, which demonstrates enhanced training performance over tanh and sigmoid utilities. Other essential information such as layer number, name, description, and dimension are clearly presented in Table 6.1. Due to its simple structure and higher classification accuracy, the AlexNet was widely considered for image classification tasks. This CNN could be executed using the MATLAB or Python softwares.

Along with the traditional AlexNet structures, a modified structure is proposed by the researchers. One improvement to the AlexNet with the deep- and handcrafted-feature is depicted in Figure 6.4. This structure will combine the learned features with the ML features to improve classification accuracy. Other information regarding this structure can be found in [5]. This structure will help to better the detection/classification of diseases by combining all possible image features.

Pre-Trained AlexNet Architecture

FIGURE 6.3 Pre-Trained AlexNet Architecture.

TABLE 6.1

Details of the CNN Architecture Used in AlexNet

Layer No.

Layer Name

Description

Dimension

I

Image

Accepts image input

227 x 227 x 3

2

Convolution

1-Convolution Layer

11x11x3 stride (4 4] padding [0 0 0 0]

3

ReLU

1-Rectified linear unit for removing negative values

-

4

Normalization

1-Normalization unit

-

5

Max-Pooling

1-Determines the Max value in a particular image array

3x3 with stride [2 2] and padding [0 0 0 0]

6

Convolution

2-Convolution layer

5 x 5 x 48 with stride [1 1] and padding [2 2 2 2]

7

ReLU

2-Rectified linear unit for removing negative values

-

8

Normalization

2-Normalization unit

-

9

Max-Pooling

2-Determines the Max value in a particular image array

3x3 max pooling with stride [2 2] and padding [0 0 0 0]

10

Convolution

3-Convolution layer

384 3 x 3 x 256 with stride [1 1 ] and padding [1111]

II

ReLU

3-Rectified linear unit for removing negative values

-

12

Convolution

4-Convolution layer

384 3 x 3 x 192 convolutions with stride [1 1 ] and padding

[1M1]

13

ReLU

4-Rectified linear unit for removing negative values

-

14

Convolution

5-Convolution layer

256 3 x 3 x 192 convolutions with stride [1 1 ] and padding

[1M1]

15

ReLU

5-Rectified linear unit for removing negative values

-

16

Max Pooling

3-Determines the Max value in a particular image array

3x3 max pooling with stride [2 2] and padding [0 0 0 0]

17

Fully Connected

1-Fully connected artificial neural network

4096 fully connected layer

18

ReLU

6-Rectified linear unit for removing negative values

-

19

Dropout

1-Reduces the number of weights (50% dropout)

-

20

Fully Connected

2-Fully connected artificial neural network

4096 x 4096

(Continued)

TABLE 6.1 (Continued)

Layer No.

Layer Name

Description

Dimension

21

ReLU

7-Rectified linear unit for removing negative values

-

22

Dropout

2-Reduces the number of weights (50% dropout)

23

Fully Connected

3-Fully connected artificial neural network

4096 x 2

24

SoftMax

Activation layer

-

25

Classification

Output

Normal/abnormal

-

Enhanced AlexNet Architecture with Deep and Handcrafted Features

FIGURE 6.4 Enhanced AlexNet Architecture with Deep and Handcrafted Features.

Pre-Trained VGG-16 Architecture

FIGURE 6.5 Pre-Trained VGG-16 Architecture.

VGG-16

VGG is a CNN architecture proposed by Simonyan and Zisserman [17] of the Visual Geometry Group (VGG) from the University of Oxford. The initial version VGG-16 features simple and effective predefined architecture and offers better results during the image classification task. The structure of the VGG-16 is depicted in Figure 6.5. The hardware description of the VGG-16 and its execution for brain image classification is presented in Table 6.2 and Figure 6.6, respectively.

Details of the CNN Architecture Used in VGG-16

TABLE 6.2

Layer (Type)

Output Shape

Parameter

input_l (InputLayer)

(None. None, None, 3)

0

blockl_convl (Conv2D)

(None. None. None. 64)

1792

blockl_conv2 (Conv2D)

(None. None. None. 64)

36928

blockl_pool (MaxPooling2D)

(None. None. None. 64)

0

block2_convl (Conv2D)

(None. None. None. 128)

73856

block2_conv2 (Conv2D)

(None. None. None. 128)

147584

block2_pool (MaxPooling2D)

(None. None. None. 128)

0

block3_convl (Conv2D)

(None. None. None. 256)

295168

block3_conv2 (Conv2D)

(None. None. None. 256)

590080

block3_conv3 (Conv2D)

(None. None. None. 256)

590080

block3_pool (MaxPooling2D)

(None. None. None. 256)

0

block4_convl (Conv2D)

(None. None. None. 512)

1180160

block4_conv2 (Conv2D)

(None. None. None. 512)

2359808

block4_conv3 (Conv2D)

(None. None. None. 512)

2359808

block4_pool (MaxPooling2D)

(None. None. None. 512)

0

block5_convl (Conv2D)

(None. None. None. 512)

2359808

block5_conv2 (Conv2D)

(None. None. None. 512)

2359808

block5_conv3 (Conv2D)

(None. None. None. 512)

2359808

block5_pool (MaxPooling2D)

(None. None. None. 512)

0

global_average_pooling2d_l

(None, 512)

0

dense_l (Dense)

(None. 1024)

525312

dense_2 (Dense)

(None. 2)

2050

Total params: 15,242,050

Trainable params: 527,362

Non-trainable params: 14,714,688

Training Procedure Implemented for an Image Classification Task

FIGURE 6.6 Training Procedure Implemented for an Image Classification Task.

The input to covl layer is assigned to process an image with dimensions of 224 x 224 x 3 (RGB) and it also accepts images with dimensions of 224 x 224 x l (Gray). Image information is be extracted with filters of measuring 3x3. This value also utilizes the filter of dimension lxl at the end when the test image has passed through a mass of convolutional (Conv.) layers. The procedures, such as convolution and MP, continue as per the layers and, finally, a one-dimensional image feature vector reaches the FCL. This network has three layers of FCL and it is associated with the sorting and dropout functions. Through this process, a onedimensional feature vector w'ith dimensions of 1 x 1 x 1024 reaches the classifier section to train, test, and validate the function of SoftMax based on the considered image database. Earlier research works on the VGG-16 can be found in [4-7].

VGG-19

A modified form of the VGG-16 is the VGG-19. The construction and working principle is similar for both VGG-19 and VGG-16. The difference between them is illustrated in Figure 6.7. The function values used in VGG-19 are depicted in Table 6.3. The image-based training implemented with the VGG-19 architecture for a brain image classification task is presented in Figure 6.8.

As depicted in Figure 6.4, the performance of the VGG (VGG16/VGG19) architecture can be improved by integrating the deep and the machine learning features as illustrated in Figure 6.9. To integrate these features, serial or parallel feature fusion is employed as discussed in previous works [4,5].

 
Source
< Prev   CONTENTS   Source   Next >