PROPOSED PAD METHOD
The PAD methodology presented in this Chapter is summarised in Figure 5.1. First, a dedicated capture device (Section 5.4.1) acquires images of the finger at four different wavelengths within the SWIR spectrum. Then, those images are processed by deep learning algorithms (Section 5.4.2). In particular, the four channel images are fed to five different CNN models, which include first an additional pre-processing layer at the beginning of the model (Section 18.104.22.168). The models and the differences between them are described in Section 22.214.171.124 and Figure 5.4. Finally, the output of different models can be fused at score level, as presented in Section 126.96.36.199, in order to achieve a more robust PAD module.
FIGURE 5.1 General diagram of the proposed PAD method. First, images in four different SWIR wavelengths are acquired from the finger. These are later used to train five different CNN models (a five-layer residual network, reduced versions of MobileNet and MobileNetV2, VGG19, and VGGFace, see Figure 5.4 for details). An initial pre-processing module is included to convert the four wavelengths into three channel images (see Figure 5.3). Finally, a score level fusion is carried out.
Hardware: Multi-Spectral SWIR Sensor
The SWIR finger capture device used in the present work was developed by our partners at USC within the BATL (2017) project. In essence, the SWIR sensor is embedded in a closed box with a slot on the top for the finger (see Figure 5.1, left), with the camera and lens placed inside the box. When the finger is placed over the slot, all ambient light is blocked and therefore only the desired wavelengths are considered during the acquisition. Furthermore, in order to avoid any interference, two images are captured at each wavelength: one with the LEDs on, and another one, “dark image”, with the illumination off. By subtracting both images, undesired light noise can be suppressed.
In contrast to the capture device described in Tolosana et al. (2019), where 64 x 64 px. images were captured at 1000 fps, in this work the sensor used (a Xenics Bobcat 320) is able to capture 320 x 256 px. images at 100 fps, with a 35 mm focal length lens. This way, higher resolution images, including more textural details and thereby more suited for deep learning studies, are acquired. As in Tolosana et al. (2019) and Steiner et al. (2016), images are captured at four different SWIR wavelengths, namely: 1,200, 1,300, 1,450, and 1,550. The differences between the images acquired by the new and the previous sensor can be observed in Figure 5.2: not only have the new images (top) at a higher resolution, but the focus has also been improved to eliminate some of the blur existing in the images acquired with the previous sensor (bottom).
FIGURE 5.2 Samples comparison. Top: samples captured by a new capture device. Bottom: samples captured with the previous device. In both cases, the complete sample at 1,200 nm is shown on the left, and the ROIs at all wavelengths on the right.
It should also be noted that the camera captures the finger slot and the surrounding area of the box. Since the finger is always placed over the fixed open slot, and the camera does not move, the ROI can be extracted using a simple fixed size cropping: The final ROI has a size of 310 x 100 px. The four ROIs for a bona fide, from now on referred to simply as images or samples, are depicted in Figure 5.2 (top right).
Finally, even though the main aim of this work is the development of PAD techniques, it is important not to forget about the fingerprint recognition task. A single capture device needs to acquire samples which can be processed for fingerprint recognition and for PAD purposes in a single acquisition attempt. Otherwise, a potential attacker would provide his own bona fide finger for PAD testing and subsequently a PAI for recognition. Therefore, the multi-modal capture device utilised also contains a second 1.3 MP camera with a 35 mm VIS-NIR lens in order to capture finger photographs from which contactless fingerprint recognition can be carried out. Kolberg et al. (2019) showed how COTS can extract minutiae correctly from these samples, in order to allow compatibility with conventional fingerprint sensors.
Software: Multi-Spectral Convolutional Neural Networks
The software PAD approach proposed in this Chapter is summarised in Figure 5.3 and compared to the workflow described in Tolosana et al. (2019). As mentioned in Section 5.1, most CNN pre-trained models have been trained on the ImageNet (Krizhevsky, Sutskever, and Geoffrey 2012) or VGGFace (Parkh. Vedaldi, and Zisserman 2015) databases, and thus expect RGB images. Flowever, the SWIR sensor described in Section 5.4.1 outputs four different grey-scale images acquired at four different wavelengths. Therefore, in order to be able to use pre-trained models and benefit from transfer learning techniques, some kind of pre-processing
FIGURE 5.3 PAD software diagram. Top: As proposed in this chapter, the four SWIR images are automatically processed by the corresponding CNN model using a single convolutional layer with three filters of size P x P and a stride of 1. In addition, batch normalisation and a ReLu activation are used to facilitate convergence. The result is a three channel image. Bottom: the handcrafted RGB conversion proposed by Tolosana et al. (2019). After the preprocessing step, the corresponding three channel image is processed by the CNN model at hand, which outputs the PAD score s.
needs to be added to convert the four channel samples to three channel images (see Section 188.8.131.52). After that, regular pre-trained CNN models can be applied (see Section 184.108.40.206).