EXPERIMENTS AND RESULTS

A set of experiments were performed in order to assess the best algorithm for detecting fake ID cards. Section 10.4.1 describes the experiments and results obtained when using machine learning techniques such as BSIF-uLBP-HED and Random Forest (RF) Classifier. Section 10.4.2, on the other hand, explores several CNN- based algorithms.

Feature Extraction Classification

The descriptor BSIF has two parameters: the filter size and the number of features extracted. In this work, all the filters were used to compute the best window size and the best number of bits for the left and right eyes. Thus, the following window sizes, 5 x 5, 7 x 7, 9 x 9, 11 x 11, 13 x 13, 15 x 15, and 17 x 17, from 5 bits up to 12 bits were calculated for each filter. The number of bits represents the number of filters used in the convolution.

The uLBP transformation also has two parameters: The radius and the number of neighbours. The original operator used a 3 x 3 window size containing 9 values. In this work, several radii values from radius 1 up to 8 using grey-scale images were explored.

The HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to resolve the challenging ambiguity in edge and object boundary detection. The Gaussian filter size (to highlight the borders) and the scale factor are two of the main parameters to be set (see Tables 10.5 and 10.6).

All the features extracted were classified using a RF approach. The RF was set up with the following parameters found after a grid search: ‘N Trees’: 600, ‘Min samples split’: 5, ‘Min samples leaf ’: 1, ‘Max features’: ‘Sqrt’, ‘Max depth’: 20, ‘Criterion’: ‘Entropy’, and ‘Bootstrap’: True. Figure 10.5 shows an example of applying these algorithms to a real and a fake ID card image.

Two experiments to extract features from the images were used. The first one uses the whole image (150 x 250) and the second one (150 x 125) only uses the left part of the image (150 x 125). This corresponds to the region where the face photo is located. Tables 10.1 and 10.2 show the parameters and results achieved when the BSIF algorithm was used. Tables 10.3 and 10.4 show results for uLBP, whereas Tables 10.5 and 10.6 report parameters and results for HED algorithm. TN, FR FN, and TP represent True Negative, False Positive, False Negative, and True Positive, respectively.

Classification Using CNN Algorithms

Intensity images of whole national ID cards were used to classify fake or real cards using a small-VGG and a VGG16 network. This section presents the results for three different experiments: (i) using the small-VGG trained from scratch, (ii) using

Graphical representation of two ID cards with HED feature extraction method applied. The border is detected and highlighted by the algorithm

FIGURE 10.5 Graphical representation of two ID cards with HED feature extraction method applied. The border is detected and highlighted by the algorithm.

TABLE 10.1

Parameters of the BSIF Algorithm (Filter Size and Bits) and the Classification Results (Acc.: Accuracy) Reached When Using 252 Full Images

Filter Sizes

Bits

TN

FP

FN

TP

Sensitivity (TPR)

Specificity (TNR)

Acc.

3x3

6

81

45

14

112

0.89

0.64

0.77

5x5

12

81

45

20

106

0.84

0.64

0.74

7x7

10

85

41

27

99

0.79

0.67

0.73

9x9

8

83

43

34

92

0.73

0.66

0.69

11x11

8

87

39

27

99

0.79

0.69

0.74

13 x 13

5

89

37

37

89

0.71

0.71

0.71

15x15

7

89

37

25

101

0.80

0.71

0.75

17 x 17

7

88

38

23

103

0.82

0.70

0.76

TPR and TNR represent True Positive Rate and True Negative Rate, respectively.

TABLE 10.2

Parameters of the BSIF Algorithm (Filter Size and Bits) and the Classification Results (Acc.: Accuracy) Reached When Using 255 Half Images

Filter Sizes

Bits

TN

FP

FN

TP

Sensitivity (TPR)

Specificity (TNR)

Acc.

3x3

6

93

33

34

92

0.73

0.74

0.73

5x5

12

91

35

36

90

0.71

0.72

0.72

7x7

10

80

46

28

98

0.78

0.63

0.71

9x9

8

84

42

34

92

0.73

0.67

0.70

11x11

8

89

37

26

100

0.79

0.71

0.75

13 x 13

5

89

37

29

97

0.77

0.71

0.74

15x15

7

87

39

21

105

0.83

0.69

0.76

17 x 17

7

93

33

21

105

0.83

0.74

0.79

TPR and TNR represent True Positive Rate and True Negative Rate, respectively.

a pre-trained model (VGG16) with bottleneck approach, and (iii) the pre-trained VGG16 model with fine-tuning approach.

The experiments were performed using a NVIDIA 1080-TI GPU with 11 GB of RAM, defining a batch size of 64 images in each round. For all the experiments, the frameworks of Keras and Tensorflow were used (Figure 10.6).

Small-VGG Trained from Scratch

A grid search was used to find the best hyper-parameters for the small-VGG CNN. A suite of different mini batch sizes from n= 16 to n= 1,024 in steps of 2" were evaluated. To set the learning rate, a small set of standard values ranging from 0.1 to 0.9

TABLE 10.3

Parameters of the uLBP Algorithm (Neighbours and Radii) and the Classification Results (Acc.: Accuracy) Reached When Using 252 Full Images

Radii

TN

FP

FN

TP

Sensitivity (TPR)

Specificity (TNR)

Acc.

8.2

92

34

37

89

0.71

0.73

0.72

8.3

85

41

47

79

0.63

0.67

0.65

8.4

80

46

51

75

0.60

0.63

0.62

8.5

78

48

55

71

0.56

0.62

0.59

8.6

71

55

49

77

0.61

0.56

0.59

8.7

65

61

51

75

0.60

0.52

0.56

8.8

60

66

51

75

0.60

0.48

0.54

8,2 to 8,8

99

27

44

82

0.65

0.79

0.72

TPR and TNR represent True Positive Rate and True Negative Rate, respectively.

TABLE 10.4

Parameters of the uLBP Algorithm (Neighbours and Radii) and the Classification Results (Acc.: Accuracy) Reached When Using 252 Half Images

Radii

TN

FP

FN

TP

Sensitivity (TPR)

Specificity (TNR)

Acc.

8.2

89

37

31

95

0.75

0.71

0.73

8.3

85

41

34

92

0.73

0.67

0.70

8.4

76

50

38

88

0.70

0.60

0.65

8.5

78

48

52

74

0.59

0.62

0.60

8.6

65

61

48

78

0.62

0.52

0.57

8.7

70

56

45

81

0.64

0.56

0.60

8.8

67

59

43

83

0.66

0.53

0.60

8,2-8,8

93

33

26

100

0.79

0.74

0.77

TPR and TNR represent True Positive Rate and True Negative Rate, respectively.

in steps of 0.1 were tested. For the momentum, values in the ranges of ie- to le-5 were considered.

A database of 3,050 images plus data augmentation was used for training the algorithm. The data were divided into 70/30 for training and testing the classifier, respectively. This number of images (people) is larger than other databases used in literature. All images were re-sized to 150 x 250 pixels.

Figure 10.7 shows a graph of the training process when using 100 and 300 epochs. The loss and accuracy curves were noisy achieving a low classification rate. This instability persists when increasing the number of epochs and reducing the learning rate. In Figure 10.7a and b, the blue line shows the low error rate reached by the

TABLE 10.5

Parameters of the HED Algorithm (Gaussian Filter Size and Scale factor) and the Classification Results (Acc.: Accuracy) Reached When Using 252 Full Images

Gaussian Filter

Scale Factor

TN

FP

FN

TP

Sensitivity (TPR)

Specificity (TNR)

Acc.

3x3

0.5

92

53

33

74

0.69

0.63

0.66

3x3

0.7

99

40

30

83

0.73

0.71

0.72

3x3

1.0

82

50

44

76

0.63

0.62

0.63

5x5

0.5

84

47

44

77

0.64

0.64

0.64

5x5

0.7

85

49

41

77

0.65

0.63

0.64

5x5

1.0

89

45

39

79

0.67

0.66

0.67

7x7

0.5

71

61

40

80

0.67

0.54

0.60

7x7

0.7

88

43

40

81

0.67

0.67

0.67

7x7

1.0

86

50

43

73

0.63

0.63

0.63

TABLE 10.6

Parameters of the HED Algorithm (Gaussian Filter Size and Scale factor) and the Classification Results (Acc.: Accuracy) Reached When Using 252 Half Images

Gaussian Filter

Scale Factor

TN

FP

FN

TP

Sensitivity (TNR)

Specificity (TPR)

Acc.

3x3

0.5

92

60

33

67

0.67

0.61

0.63

3x3

0.7

99

46

30

77

0.72

0.68

0.70

3x3

1.0

82

56

44

70

0.61

0.59

0.60

5x5

0.5

84

63

44

61

0.58

0.57

0.58

5x5

0.7

85

55

41

71

0.63

0.61

0.62

5x5

1.0

89

61

39

63

0.62

0.59

0.60

7x7

0.5

71

66

40

75

0.65

0.52

0.58

7x7

0.7

88

49

40

75

0.65

0.64

0.65

7x7

1.0

86

56

43

67

0.61

0.61

0.61

model in the validation set. The grey line, on the other hand, shows the classification accuracy reached for the CNN in the validation set.

Pre-trained VGG16 Model and Bottleneck

In order to improve the classification results from previous experiments, a pre-trained VGG16 model was used to extract features using the bottleneck technique.

Figure 10.8a shows graphical training process results for bottleneck. Table 10.7 shows the summary of the results using different setting parameters. The best results were reached by the learning rate of lc-5, 300 epochs and batch size of 64.

Image feature texture analysis

FIGURE 10.6 Image feature texture analysis. Top images were computed using uLBP. Left corresponds to an original ID card and right to a fake ID card. The middle row shows original (left) and fake (right) ID card when using BSIF 7x7. The bottom rows are the same BSIF images but using a colour representation.

Analysis of the training loss and accuracy between fake and real images when training a small-VGG from scratch. In (a), 100 epochs were used, while (b) shows the results for 300 epochs

FIGURE 10.7 Analysis of the training loss and accuracy between fake and real images when training a small-VGG from scratch. In (a), 100 epochs were used, while (b) shows the results for 300 epochs.

Analysis of the bottleneck and fine-tuning techniques applied to a pre-trained VGG16 model,

FIGURE 10.8 Analysis of the bottleneck and fine-tuning techniques applied to a pre-trained VGG16 model, (a) The training loss and accuracy between fake and real images when using only bottleneck, (b) The training loss and accuracy between fake and real images when using only fine-tuning. The x-axis represents the number of epochs and the у-axis. the accuracy.

TABLE 10.7

Classification Results Achieved Using Small-VGG Trained from Scratch (Row 1), VGG-16 Using Bottleneck (Row 2), and VGG-16 Using Fine-Tuning (from Row 3 to the End)

Trained Interval

TPRAUC

TNRAUC

ACC

Model Size MB

Conv. Block

Scratch

0.60

0.61

0.60

500

All

Bottleneck

0.79

0.72

0.76

2,857

-.-

L0

0.94

0.94

0.94

3,658

All Trainable

LI

0.78

0.68

0.73

2,857

Block 5

L2

0.86

0.79

0.83

2,952

L3

0.85

0.81

0.83

3.046

L4

0.93

0.88

0.91

3,141

L6

0.86

0.77

0.81

3,235

Block 4

L7

0.91

0.90

0.91

3,329

L8

0.77

0.81

0.79

3,424

L9

0.95

0.91

0.93

3,518

LI 1

0.96

0.93

0.95

3,565

Block 3

LI2

0.93

0.86

0.90

3.589

LI3

0.94

0.92

0.93

3,613

L15

0.94

0.87

0.91

3.648

L16

0.93

0.93

0.93

3.648

Block 2

L17

0.93

0.95

0.94

3.654

L19

0.96

0.90

0.93

3.657

Block 1

L20

0.88

0.83

0.86

3,658

The classification accuracy achieved was 84.00% in the B5 block. This result outperforms the traditional small-VGG trained from scratch (Section 10.4.2.1).

Table 10.7 shows the different results obtained when adjusting each convolutional block of the model and when only using the features extracted by the bottleneck technique.

An example of the activation map using grad-cam algorithm [21] of the national ID card is shown in Figure 10.9. This heat map represents the most relevant areas that are considered during classification.

The colours of the heat map represent the relevance of the features, where the warm colours (red, orange, yellow, and purple) are the most relevant features and cold colours (blue) are the less relevant features. For colour images description check the online version.

Pre-trained VGG16 Model and Fine-Tuning

In order to better improve the tampering detection on national ID cards, a pre-trained model to extract features using the fine-tuning technique was used.

The fine-tuning technique refers to initialising a CNN with pre-trained parameters instead of random parameters and then re-training it on a new' dataset using very small weight updates. This allows the process of network learning to be accelerated and the generalisation skill to be improved, thanks to the initial information that is delivered to the network [5]. This process was completed in three steps:

  • 1. instantiate the convolutional base of VGG16 and load its weights
  • 2. add the previously defined fully connected model on top and load its weights
  • 3. freeze the layers of the VGG-16 models up to the last convolutional block

Figure 10.8b shows a graph of the training process for the best fine-tuning result. The re-trained network from the blocks B1 and B2 reached the best classification accuracy of 95.65% (see the results in Table 10.7). Indeed, if the network is re-trained in a deeper layer, the classification results decreased. However, w'hen using the fine- tuning approach, all the results outperform those obtained when using a small-VGG trained from scratch or w'hen using VGG-16 with bottleneck.

Heat map of the most relevant features from the different blocks (stages) belonging to VGG-16

FIGURE 10.9 Heat map of the most relevant features from the different blocks (stages) belonging to VGG-16.

 
Source
< Prev   CONTENTS   Source   Next >