DeepFake Face Video Detection Using Hybrid Deep Residual Networks and LSTM Architecture


Nowadays the fields of deep learning (DL) and artificial intelligence (AI) have expanded considerably. They are widely being used in many daily life applications and systems. Similarly, now the social networks play vital role in information and news dissemination. The advent of machine learning (ML)/AI together with social media has changed the perception of reality, especially in the digital world. For instance, the current technologies are being employed to create fake multimedia samples, which have become almost indistinguishable from the real ones. There exist readily available image and video manipulation software and apps (e.g., Face2Face, AgingBooth, FaceApp, PotraitPro Studio, and Adobe Photoshop), which do not need much technical knowledge. So that, the potential count of manipulated images and videos is very large. In fact, many of the videos posted either on the internet or on social media that become viral are fallacious and manipulated. Manipulated images/videos could be for benign reasons (e.g., images retouched for beautification) or antagonistic goals (e.g., fake news campaigns). In particular, fake face images/videos generated using DL techniques have recently gained a great public concern and attention. Such digitally manipulated facial samples are known as “DeepFakes”, which are fake facial images/videos obtained by swapping the face of one individual by the face of other individual using AI/DL-based methods [1,2,3]. Representative examples of some of the most used publicly available apps are DeepFake, Face2Face, FaceApp, Face Swap Live. Such apps and software can be used to manipulate face age, facial hair, gender of the person, hair colour, facial expression, swapping two faces w'ith each other, or generate synthetic facial samples of a person that does not exist in the real world, as also shown in Figure 4.1.

Although DeepFakes are mostly harmless and can be used for research or amusement purposes, the simple and easy to use software/apps can be utilised to produce audio and video imitations for theft, fraud, or revenge porn. DeepFake can influence the election results as fake videos can make people believe that a certain politician is saying things that he did not say or did. Likewise, fake evidences created with DeepFake

Examples of various face manipulations. First row

FIGURE 4.1 Examples of various face manipulations. First row: original face samples. Second row: manipulated face samples. Last column: a synthetically generated face.

techniques can be used against people in court, thereby innocent person can be charged with crimes they did not commit. On the other hand, guilty people can be released on the basis of false evidences. Also, people could alter their faces to appear younger or older to deceive age-based access controls. It has been shown that face ageing and face spoofing negatively affect the automated face recognition and identification systems [4-8]. In fact, DeepFakes not only can trick people but also degrade the accuracy of facial recognition system at the same time. For instance, Korshunov et al. [9] demonstrated that DeepFakes could escalate the error rates of VGG and FaceNet neural network-based face recognition approaches by 85.62% and 95.00%, respectively.

A typical countermeasure to DeepFakes is DeepFakes detection methods that target at distinguishing real face samples from manipulated faces [10-12]. For instance, The authors in Ref. [13] proposed a generalised metric-learning-based system that can detect Deepfakes from different datasets. Neubert et al. [14] designed and evaluated frequency and spatial domain feature spaces for face manipulation detection. Inspired by the recent success of DL frameworks in diverse set of applications such as object detection and autonomous car, researchers have studied and explored the efficacy of the DL techniques against face manipulation detection. Namely, in the last few years, DL schemes have successfully been used to detect DeepFakes. Especially, convolutional neural networks (CNNs) are employed to determine features from every frame for detection. Using part of the pre-trained CNN as the feature extractor is a proficient way to expand accuracy of face manipulation detection [15,16]. Also, a constrained convolutional layer [17], a statistical pooling layer [18], two-stream network [12], and two cascaded convolutional layers relied on the CNN [19] approaches were used for detection. Coherent survey of the prior DeepFakes methods demonstrated that the detection frameworks have progressed significantly and attained promising results but yet face difficulties in detecting sophisticated face manipulations [3]. Moreover, new and complicated face manipulations are hard to be noticed by existing forensics tools and human experts [13]. There is a huge demand to devise methods that attain impressive and improved accuracy.

In this chapter, we develop a hybrid framework method for DeepFake videos. The proposed method is composed of face detection, extraction of deep features, and long short-term memory (LSTM) classification. For a given video, first the face regions are detected in each frame. The detected face regions are fed to a pre-trained CNN model (i.e., FC1000 layer of the pre-trained deep residual network model) in order to extract feature. The extracted features are then used in seven layered LSTM model [i.e., input, two bidirectional LSTM (biLSTM), dropout, fully connected (FC), soft- max, and output layers] for classification. Experimental analyses on the two public datasets (i.e., DeepFakeTIMIT and Celeb-DF) were performed using the false acceptance rate (FAR), the false rejection rate (FRR), and equal error rate (ERR) metrics. The proposed framework on DeepFakeTIMIT obtained 2.4217% EER and 0.0795% FRR@FAR10 (FRR percentage when FAR as 10% was used as the performance evaluation threshold). Similarly, on Celeb-DF dataset, it achieved 0.5014% EER and 0% FRR@FAR10. Moreover, the proposed framework outperformed the previously proposed DeepFake detection methods.

The remaining part of the chapter is structured as below. Section 4.2 outlines existing works on face manipulation. The developed method is detailed in Section 4.3.

Experimental database, figures of merit, experimental protocol, and empirical analyses are described in Section 4.4. Few future research directions and open issues are presented in Section 4.5. Section 4.6 outlines conclusions.

< Prev   CONTENTS   Source   Next >