Table of Contents:

DEFINITIONS

In the following, we include the main definitions stated within the ISO/IEC 30107-3 standard on biometric PAD - part 3: testing and reporting (ISO/IEC JTC1 SC37 Biometrics 2017), which will be used throughout this chapter:

  • Bona Fide Presentation: “interaction of the biometric capture subject and the biometric data capture subsystem in the fashion intended by the policy of the biometric system”. That is, a normal or genuine presentation.
  • Presentation Attack (PA): “presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”. That is, an attack carried out on the capture device to either conceal your identity or impersonate someone else.
  • Presentation Attack Instrument (PAI): “biometric characteristic or object used in a presentation attack”. For instance, a silicone 3D mask or an ecoflex fingerprint overlay.
  • PAI Species: “class of presentation attack instruments created using a common production method and based on different biometric characteristics”.

In order to evaluate the vulnerabilities of biometric systems to PAs, the following metrics should be used: [1]

• Bona fide Presentation Classification Error Rate (BPCER): “proportion of bona fide presentations incorrectly classified as attack presentations in a specific scenario”.

Derived from the aforementioned metrics, the detection equal error rate (D-ERR) is defined as the error rate at the operating point where APCER = BPCER. In addition, to evaluate the convenient operating point recommended by the IARPA Odin program, the APCER at a BPCER = 0.2% is denoted as APCER,, 2%.

RELATED WORKS

Following the discussion in Section 5.1, we describe in this section the latest fingerprint hardware-based PAD methods, with a special focus on deep learning-based techniques, due to their superior detection performance in comparison with approaches based on handcrafted features. The most relevant works are summarised in Table 5.1. For more details on other PAD approaches, the reader is referred to the corresponding surveys on the topic (Marasco and Ross 2015; Sousedik and Busch 2014; Marcel et al. 2019).

TABLE 5.1

Summary of the Most Relevant Methodologies for Fingerprint PAD Based on Non-conventional Sensors

Technology

Reference

Approach

Performance

8 PAIs

ОСТ

Meissner, Breithaupt, and Koch(2013)

Sweat glands detection*

APCER= 16% BPCER = 7%

-

Chug and Jain (2019)

Patch-wise CNNs

APCER = 0.17% BPCER = 0.2%

8

VIS multi- spectral

Rowe, Nixon, and Butler (2008)

Wavelet transform*

APCER = 0.9% BPCER = 0.5%

49

LSCI

Keilbach et al. (2018)

Texture descriptors and SVMs*

APCER = 10.97% BPCER = 0.84%

32

Kolberg, Gomez-Barrero, and Busch(2019)

Texture descriptors and fusion of classifiers*

APCER = 9.05% BPCER = 0.05%

35

Mirzaalian, Hussein, and Abd-Almageed (2019)

LSTM

APCER = 12.9% BPCER = 0.2%

6

SWIR

Tolosana et al. (2019)

Full image CNNs

APCER й 7% BPCER = 0.2%

35

Gomez-Barrero and Busch (2019)

Multi-spectral CNNs

APCER = 1.35% BPCER = 0.2%

35

Proposed Approach

Multi-spectral CNNs

APCER = 1.16% BPCER = 0.2%

41

SWIR + LSCI

Hussein et al. (2018)

Patch-based CNNs

APCER = 0% BPCER = 0.2%

17

Gomez-Barrero, Kolberg, and Busch (2019)

Texture

descriptors + CNNs

APCER й 2% BPCER = 0.2%

35

The three approaches marked with1 represent methods based on handcrafted features.

As a first alternative to conventional fingerprint sensors, multi-spectral capture devices have been designed for fingerprint recognition and PAD purposes. In particular, Rowe, Nixon, and Butler (2008) developed a pioneering multi-spectral fingerprint sensor a decade ago, which has now evolved into a COTS device. The Lumidigm sensor captures multi-spectral images in four different wavelengths: 430, 530, and 630 nm, as well as white light. In their article, the authors study not only the fingerprint recognition accuracy achieved with the multi-spectral images but also the feasibility of implementing PAD methods. To that end, absolute magnitudes of the responses of each image to dual-tree complex wavelets are computed. In a self-acquired database, including 49 PAI species, an APCER of 0.9% is reported for a BPCER of 0.5%. Even if these results are remarkable, the PAD methods used are not described in detail, and not much information about the acquired database or the experimental protocol are available. Therefore, it is difficult to establish a fair benchmark with similar works.

More recently, another set of approaches based on multi-spectral images captured within the SWIR spectrum has been developed (Gomez-Barrero, Kolberg, and Busch 2018; Tolosana et al. 2019; Gomez-Barrero and Busch 2019) within the BATL (2017) project, motivated by the initial works of Steiner et al. (2016) for facial images. In this case, samples are captured at four different wavelengths; 1200, 1300, 1450, and 1550 nm. As mentioned in Section 5.1, this area of the spectrum is especially relevant for performing a skin vs. non-skin classification, since all skin types present similar remission curves for the aforementioned wavelengths. In other words, the intra-class variability of the bona fide samples is minimised. In a preliminary evaluation on a small dataset of 60 SWIR samples, comprising 12 different PAI species, Gomez- Barrero, Kolberg, and Busch (2018) showed the feasibility of using pixel-level spectral signatures extracted from SWIR data. However, the detection performance of those handcrafted features was clearly outperformed by deep learning architectures, in particular a pre-trained VGG19 network: Tolosana et al. (2018) achieved perfect results over the same small database.

In a follow-up study, Tolosana et al. (2019) thoroughly analysed the use of deep learning architectures in combination with SWIR data for PAD purposes. In the first step, the four images, acquired at different wavelengths, were combined into a single RGB image with a linear operation in order to have the adequate input for the CNNs. Then, the authors tested both pre-trained models (MobileNet and VGG19) and a self- designed residual network trained from scratch, denoted as ResNet. Over a database comprising over 4,700 samples and 35 different PAI species, and using only 260 samples for training and 180 for validation (i.e., almost 4,300 samples for testing), the score level fusion of MobileNet and ResNet achieved the best performance: APCER(12%«7%. More recently, Gomez-Barrero and Busch (2019) were able to improve those results by including an additional convolutional layer in the models which substitutes the handcrafted RGB conversion of the samples. In particular, the score level fusion of three networks yielded an APCER,l25t = 1.35%.

In addition to those multi-spectral devices, fingerprint PAD methods have been proposed for two different technologies widely used for biomedical applications: optical coherence tomography (OCT) and laser speckle contrast imaging (LSCI). In both cases, the analysis of inner parts of the finger, below the surface, allows to extract a number of features which can help discriminating bona fide from attack presentations. On the one hand, OCT scanners acquire high-resolution, cross-sectional images of internal tissue microstructures by measuring their optical reflections (Huang et al. 1991). To that end, a beam of near infrared (NIR) light is split into a sample or object of interest and a reference mirror. When the difference between the distance travelled by the light for the sample and the reference paths is within the coherence length of the light source, an interference pattern representing the depth profile at a single point is produced. This is known as А-scan. A lateral combination of several А-scans yields a cross-sectional scan, referred to as В-scan. Furthermore, 3D volumetric representations can be created by stacking multiple В-scans. Such representations of the inner layers of the finger skin allow the analysis of eccrine glands and capillary blood flow. Following this line of thought, since 2006 different laboratories worldwide have carried out visual analysis of the aforementioned В-scans to discriminate between bona fide and presentations attacks (Cheng and Larin 2006, 2007; Bossen, Lehmann, and Meier 2010; Liu and Buma 2010; Moolla et al. 2019).

In spite of those promising studies, due to the large amounts of time necessary to capture the OCT data and the cost of the scanners, no systematic analysis had been carried out on large- or medium-size datasets - only up to 153 samples had been acquired by Bossen, Lehmann, and Meier (2010). To tackle this issue, Sousedik, Breithaupt, and Busch (2013) and Sousedik and Breithaupt (2017) proposed an enhanced pipeline to pre-process the massive raw OCT data into more manageable representations in a short time. In addition, an automatic gland detection approach was proposed, which the authors argued could be used for PAD. In fact, Meissner, Breithaupt, and Koch (2013) used helical eccrine gland ducts to distinguish bona fide from attack presentations over the largest database acquired so far, comprising almost 7,500 bona fide images and 3,000 PA samples. Even if not many details are provided on their algorithms, the authors report an APCER = 16% for a BPCER = 7%. In 2019, Liu. Liu, and Wang (2019) achieved a remarkable 0% APCER and BPCER only analysing the peaks of ID depth signals to detect four different PAI species of different thicknesses over a rather small dataset comprising 90 samples.

In contrast to the previous OCT-based works, based on handcrafted features and mostly evaluated on rather limited datasets, Chug and Jain (2019) analysed a database comprising 3,413 bona fide samples and 357 PAs, stemming from eight different PAI species. In more details, the proposed method trained the Inception-v3 network (Szegedy et al. 2016) from scratch on local patches extracted from fingerprint depth profiles from cross-sectional В-scans. The local patches were selected in areas where at least 25% of the pixels have non-zero values in order to have enough depth information. On a five-fold cross-validation protocol over the aforementioned dataset, using approximately 3,000 samples for training and 760 for testing, almost perfect detection rates were reported: an APCER02,(of 0.17%. Even if in this case, the acquisition time remains below one second (i.e., it can be considered for real-time applications), the capture device costs over 80,000 USD, which is still the main drawback of this otherwise promising technology.

On the other hand, LSCI techniques are based on a different interference phenomenon of coherent light (i.e., a laser). When such a coherent light is reflected by a rough surface, a granular pattern of dark and bright spots appears as the light scatters on the surface and the waves either add up or cancel out. This is called a speckle pattern (Goodman 1975). Furthermore, since the laser light has a certain penetration depth, if moving scatterers are present (i.e., blood), the speckle pattern will change over time (Vaz et al. 2016). Therefore, speckle patterns can be used to detect blood flow and, eventually, PAs. To that end, within the biomedical applications, the raw LSCI sequences are pre-processed either in the temporal or in the spatial domain to compute the speckle contrast.

Based on that principle, Keilbach et al. (2018) analysed the PAD capabilities of LSCI sequences over a large database also captured within the BATL (2017) project, comprising 32 PAIs and more than 750 samples. In the first step, LSCI sequences were captured from three contiguous regions of the finger, and temporally pre- processed in order to obtain a single averaged LSCI image per region. Afterwards, several descriptors were extracted from the averaged LSCI images, including the well-known local binary patterns (LBPs), binarised statistical image features (BSIFs), and the histogram of oriented gradients (HOGs). The extracted features were subsequently classified using support vector machines (SVMs). A final cascaded score level fusion yielded an APCER = 10.97% for a BPCER = 0.84%. It should be noted that in this case, only 136 samples were used for training the SVMs, in contrast to the larger training sets required by most deep learning approaches.

In a subsequent work, Kolberg, Gomez-Barrero, and Busch (2019) reduced the captured regions with the LSCI sensor from three to two, since a deeper analysis of the database showed that the region under the fingernail presented undesired noise. Then, using the same descriptors as Keilbach et al. (2018), the authors established a benchmark, including nine different classifiers. They found that the best results with grey-scale histograms and LBP were achieved with random forests, with SVMs for BSIF, and with stochastic gradient descent for HOG. Therefore, a multi-algorithm fusion of the aforementioned features and classifiers led to an APCER = 9.01% for a BPCER = 0.05% over the extended database and protocol established by Tolosana et al. (2019).

Since the capture device used for the database acquisition in Keilbach et al. (2018) and Tolosana et al. (2019) can acquire both LSCI and SWIR data simultaneously, Gomez-Barrero, Kolberg, and Busch (2019) tested a score level fusion of the aforementioned handcrafted LSCI features (Keilbach et al. 2018) and the SWIR deep learning approach first presented by Tolosana et al. (2019). Evaluated over the same dataset, and following the same protocol as Tolosana et al. (2019), the APCER,) 2% » 7% was reduced down to APCER,, 2% ~ 2%. The reason of this improvement lies on the fact that, whereas the SWIR images allow for an analysis of the surface of the finger or the PA, the LSCI technology enables an analysis of the inner side of the finger, as mentioned above. Therefore, both approaches focus on complementary information, which, when combined, lead to a more robust PAD method.

In contrast to that combination of handcrafted and learned feature-based approach, Hussein et al. (2018) proposed a full deep learning method to fuse SWIR and LSCI data. Whereas all previous works were based on a fixed ROI, in this case, a variable size ROI was used for training, depending on the PAI species. Then, 8 8 px. patches were extracted from the images, resulting in either 4-dimensional tensors for the SWIR data, or 100-dimensional vectors for the LSCI data (i.e., first 100 frames out of the total 1,000 LSCI frames acquired). Those patches were fed to a simplified version of AlexNet (Krizhevsky, Sutskever, and Geoffrey 2012), which produced a score per patch. The average score was used for the final decision. This approach was tested over a dataset comprising 551 bona fide and 227 PA samples, stemming from 17 PAI species. Evaluated on a five-fold protocol, with 552 samples for training, 86 for validation, and 140 samples for testing, the SWIR-based network achieved an APCER = 2.5% at BPCER = 0% and the LSCI-based approach achieved an APCER = 8.9% at BPCER 1.3%. Similar to Gomez-Barrero, Kolberg, and Busch (2019), the fusion of both technologies further reduced the error rates to APCER = BPCER = 0%.

In a subsequent work, Mirzaalian, Hussein, and Abd-Almageed (2019) conducted a study on different patch-wise DNN architectures for raw LSCI sequences, over a larger database than that considered by Hussein et al. (2018). In particular, they analysed the baseline architecture first tested by Hussein et al. (2018), a modification of the former, including residual connections, a shallower version of the GoogLeNet architecture (Szegedy et al. 2015), and a double-layer long short-term memory (LSTM) network. The latter has the advantage of being able to process temporal sequences, such as the acquired LSCI data, which, in this work, is not pre-processed but used in its raw form. The evaluation dataset consisted on 3,743 bona fide samples and 218 PA samples, including six different PAI species. Over a six-fold leave-one- attack-out partition of the database, the LSTM network achieved the best detection performance: an APCER,, 2% of 8.81%. It should be noted that in this case over 3,800 samples were used for training and validation and only around 160-180 for testing (depending on the fold).

In summary, we can extract the following take-away messages from the current literature on both deep learning and handcrafted-based PAD techniques: sample is a bona fide or an attack presentation is fast. Therefore, the impact of using deep learning approaches on practical scenarios is minimised. The only remaining issue is the memory: CNNs tend to comprise a high number of parameters [from 319,937 to 20,155,969 in (Tolosana et al. 2019)], which need to be stored in the device memory. This is usually not the case compared to other traditional classifiers such as SVMs.

• Finally, two major challenges in the field are related to the detection of unknown PAs and cross-sensor scenarios. Most studies so far have shown a detection performance degradation. To alleviate this, different approaches consider data augmentation techniques through a synthetic PA sample generator (Chugh and Jain 2019), or handcrafted features embedding (Gonzalez-Soler et al. 2019) have been proposed for traditional fingerprint sensors. Their applicability to other technologies, such as SWIR, LSCI, or OCT, still needs to be explored.

  • [1] Attack Presentation Classification Error Rate (APCER): “proportion ofattack presentations using the same PAI species incorrectly classified asbona fide presentations in a specific scenario”.
 
Source
< Prev   CONTENTS   Source   Next >