As reviewed previously, remote authentication systems aim to match an input data (i.e. face) with data from the same individual stored in a database. In these cases, the key is to detect whether the input data are fake or real in order to ensure a secure authentication system. This work, on the contrary, focuses on authentication systems where no data from the user are previously available. Such remote authentication systems have two inputs: a selfie and an official ID card. This work concentrates on the first step of the authentication system and aims to detect whether an ID card is real or fake. In other words, the goal is to detect if an ID card has been deliberately manipulated by replacing the face photo. This implies localising in the images for any ridges, edges, spots, or other forms of information that do not belong to an unaltered (original) ID card. There are several ways of manipulating ID cards. However, only two scenarios are considered in this work: (i) when the face image is manipulated manually and (ii) when the face photo is altered digitally. A graphical example is shown in Figure 10.2.

To detect and classify if the ID card is fake or real, traditional methods based on texture features and CNNs were studied [8]. The database and the algorithms used are described as follows.

Graphical representation of two scenarios to create fake images, (a) The face image coming from a digital device and from manual manipulation (i.e. a printed photo), (b) Real ID images

FIGURE 10.2 Graphical representation of two scenarios to create fake images, (a) The face image coming from a digital device and from manual manipulation (i.e. a printed photo), (b) Real ID images.


Two databases were used in this work. The first one corresponds to a database of Chilean national IDs. This is a private database which contains 1,525 images of real Chilean ID cards from 316 different people. The second database was made by manipulating these Chilean ID cards using two techniques: manual and digital.

Fake ID Card Database (Manually Manipulated): This database contains 762 Chilean ID cards where the face image has been replaced with face images of other people printed and stuck on the ID card. This is an easy and cheap technique used to fake ID cards and can be used without any knowledge of digital photo processing for normal or traditional users. This kind of attack is very common within remote verification ID systems.

Fake ID Card Database (Digitally Manipulated): A total of 762 ID card images were manipulated by automatically detecting the face and replacing it with random face images. This technique allows a large quantity of fake ID images to be created in a short period of time. Alternatively, the face can be replaced manually using Photoshop or similar software to retouch images. This allows the fake face image to be better merged with the rest of the ID card making it difficult to detect. However, this technique is time- consuming which limits the feasibility of creating larger databases for training and testing algorithms.

Hand-Crafted Feature Extraction (BSIF, uLBP, and HED)

The first approach proposed in this work is based on machine learning techniques. Texture features are extracted from the 2D image of the ID card using three different algorithms: Uniform Local Binary Patterns (uLBP), Binary Statistical Image Feature filter (BSIF), and Holistically Nested-Edge Detection (HED).

BSIF [14] is a local descriptor constructed by binarising the responses to linear filters. The code value of pixels is considered as a local descriptor of the image intensity pattern in the pixels’ surroundings. The value of each element (i.e. bit) in the binary code string is computed by binarising the response of a linear filter with a zero threshold. Each bit is associated with a different filter, and the length of the bit string determines the number of filters used.

uLBP [1] is a grey-scale texture operator which characterises the spatial structure of the local image texture. Given a central pixel in the image, a binary pattern number is computed by comparing its value with those of its neighbours.

The edge-detection algorithm (HED) [28] was developed to address two important issues in the vision problem: (i) holistic image training and prediction, and (ii) multi-scale and multi-level feature learning. The HED performs image-to-image prediction by means of a deep learning model that leverages fully CNNs and deeply supervised nets. HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to resolve the challenging ambiguity in edge and object boundary detection.

These algorithms have been shown to outperform the state-of-the-art in texture methods. Image features are then classified in two classes (fake and real) using a Random Forest Classifier [7].

Automatic Feature Extraction (CNN)

As a second approach, two deep learning algorithms were tested. Deep learning techniques have been shown to be very effective in localising ridges, edges, and spots in the images [22], making them suitable for this problem.

First, a small-VGG [15] network (CNN-1) was used to classify fake and real ID cards. The small-VGG network comprises only three convolutional blocks and a fully connected layer with a small number of neurons. The choice of a smaller network design was motivated both from the desire to reduce the risk of overfitting as well as the nature of the problem, which attempts to solve a two-class classification task (fake and real). Figure 10.3 shows a scheme of the algorithm architecture. The three channels are processed directly by the network.

In order to find the best implementation, different parameters such as sparse connectivity, shared weight, pooling techniques, and hyper-parameters were defined. In this work, sparse connectivity was used by default, while shared weight, pooling techniques, and hyper-parameters (batch size, epochs, learning rate, and momentum) were explored in-depth in the experimental section. They were all tuned while fitting the network.

The batch size in the iterative gradient descent is the number of patterns shown to the network before the weights are updated. There is also an optimisation in the training of the network, defining how many patterns to read at a time and keep in memory. The number of epochs is the number of times that the entire training data set is shown to the network during training. The learning rate parameter controls how much to update the weight at the end of each batch and the momentum controls how much to let the previous update influence the current weight update.

Second, a pre-trained VGG16 [23] model with bottleneck and fine-tuning techniques was also tested. This model has been pre-trained on a large data set called

Architecture of the small-VGG network

FIGURE 10.3 Architecture of the small-VGG network.

VGG-16 Architecture. B1 up to B5 represent the convolutional blocks

FIGURE 10.4 VGG-16 Architecture. B1 up to B5 represent the convolutional blocks.

ImageNet. This data set contains a total of 1,000 classes, none of them including ID card images. This model had already learned features that are useful for most computer vision problems such as a ridge, lines, spots, and others. Leveraging such features allows better accuracy results to be reached than any method that would only rely on the available data. The architecture of the VGG16 model is shown in Figure 10.4.

The following section described the experiments and results obtained for classifying real and fake national ID cards when using machine learning and deep learning techniques.

< Prev   CONTENTS   Source   Next >