A Study on the Influence of Angular Signature of Landmark Induced Triangulation in Recognizing Changes in Human Emotion

Analysis of human emotion from the sequence of face images is a vital issue for a person to recognize the changes in emotional йог. In this work, we propose an effective method for recognition of temporal dynamic variations in human emotion from facial video frames that used a triangulation mechanism to generate triangles on the face. In our approach, angular information extracted from every triangle generated by landmark points is taken into the account as geometric features that help us to distinguish human emotion into different basic facial expressions like anger, disgust, fear, happiness, sadness, and surprise. Besides, we considered important regions on the face to get relevant geometric features that discriminate an image sequence from others. For verification of the performance of our proposed method, we experimented with our emotion recognition system on different benchmark image sequence databases like Extended Cohn-Kanade(CK+), MMI and MUG. A comparison in experimental results obtained from different databases encompasses the efficiency of the proposed method with a promising recognition rate.

INTRODUCTION

Recognition of temporal changes of human facial expressions provides a substantial impact on researchers in recent eras. Of late Affective Computing [3], which deals with intelligent recognition of human emotion, attracts the attention of the concerned research community to a large extent. It is very difficult to understand human emotions for a machine but an Affective Computation facilitates a machine to perceive, realize and amalgamate emotions. According to [4], various expressions of a human face at different levels has been regulated by varying facial activities such as the action of muscles on the face may cause changes in facial behaviors individually or in groups. Paul Ekman and Friesen [4] have placed human emotions into six different basic expression labels: anger, disgust, fear, happiness, sadness, and surprise. They introduced the Facial Action Coding System (FACS) imposing an indication of recognition of landmarks movement on the face in terms of the temporal profile of action units (AUs). Authors in [5] realized that detection of perfect landmark points on the face is a more difficult task than facial expression classification. Active Appearance Model (AAM) [2] is an assortment of both texture and shape models that provides landmark points on the face. In [6], the authors mentioned that Facial Expression Recognition System (FERS) can be developed by using two main approaches: one is a static based approach which used static images to get geometric features and another is a dynamic based approach that utilized video frames. Tracking of landmark points on the face from video frames is a more challenging task than static images due to high data dimensionality. Authors in [7],[8],[9] used static images for emotion recognition. Only finding the best geometric representation of images or sequence of images is not sufficient to classify the human emotion. To make a perfect classification of expression, a classifier needs to play an important role. Most of the approaches like [10], [11], [12] used Support Vector Machine (SVM), Artificial Neural Net?work (ANN) and Naive Bayesian Classifier (NB) respectively to discriminate facial expressions into different basic labels.

Motivation: In recent decades, authors in [13], [14] and [15] applied image sequences into their recognition system to identify the temporal behavior of an expression. It is observed that the recognition system based on static images is unable to capture time-varying activities of individual expression due to a lack of intermediate frames of emotion. These t houghts motivate us to deal with image sequences to find out information about the dynamic characteristics of human emotion. Our proposed method used all frames in the transition from neutral to peak expression to bridge the information gap between static and dynamic based approaches.

Our contributions: 1) Triangle generation from landmark points: Geometric positions of landmark points on the face are identified by applying the Active Appearance Model (AAM) on image sequence. Then we have generated triangles from those relevant geometric positions which are associated with the important regions on the face: eyes, eyebrows, nose, and lips. 2) Angular signature: The triangulation mechanism is used to construct an angular signature by considering three angles of each triangle. We prepared an angular signature by taking the ratio between angles. 3) Classification on angular signature: Angular signature is formed as a prominent geometric feature with high discrimination power and is fed into Artificial Neural Network (ANN) to distinguish sequences into basic expression labels. Classification is conducted on various benchmark image sequence databases: CK+, MMI, and MUG. Angles formed by different triangles joining selected landmark points on facial images are not yet explored as an effective feature descriptor for expression identification. This is as per the best of our knowledge in the relevant domain. In our opinion, the application of such a feature descriptor is a novel initiative.

The remaining parts of this paper are organized as follows. We presented the proposed method in section 4.2. Results and discussion are described in section 4.3. In section 4.4, we provided a comparison of our results with those of other methods. In the end, the conclusion is drawn in section 4.5.

PROPOSED METHOD

Our proposed Facial Expression Recognition System (FERS) is divided into three subsystems: a) landmark identification, b) geometric feature extraction and c) emotion recognition. The workflow of our proposed method is shown in Figure 4.1. We have developed the emotion recognition system based on the dynamic approach using facial image sequences. We need to address the dynamic

Diagram of our proposed recognition system

Figure 4.1: Diagram of our proposed recognition system

behavior of human emotion through our proposed system. Geometric based features are used to accumulate the information for discriminating one emotion from others. That is why the development of our recognition system is initiated with landmark identification providing geometric locations. Depending on landmark projection on facial images, we have reduced the number of frames from the sequence. We have selected the best ten frames from every sequence into consideration according to the perfect projection of landmark points on the frame where the first frame containing a neutral image and the last frame is one of the basic emotions. Figures 4.2, 4.3, 4.4, 4.5, 4.6 and 4.7 show the transition of anger, disgust, fear, happiness, sadness and surprise emotions from neutral respectively.

Landmark Identification

Facial muscle points on a facial image have a huge impact on the recognition of changes in the behavior of an emotion. These points are obtained by our proposed landmark identification subsystem. We have applied the Active Appearance Model [2] on image sequence to identify the geometric location of landmark points. A total sixty-eight of geometric locations are raised from every frame

Video of 10 frames starting with neutral expression and ending with anger expression image

Figure 4.2: Video of 10 frames starting with neutral expression and ending with anger expression image

Video of 10 frames starting with neutral expression and ending with disgust expression image

Figure 4.3: Video of 10 frames starting with neutral expression and ending with disgust expression image

in a sequence. Figure 4.8 shows the positions of those sixty-eight landmark points on single image frame. In the study of Barman and Dutta [1], it is noticed that among them only twenty-one locations are playing a crucial role to detect the changes in the behavior, and we have added an extra two points taken from the lips for further detailing. As per [1] these selected points are those which reflect maximum sensitivity for various expression types. Those crucial geometric locations are very sensitive due to the dislocation of

78 ■ Computational Intelligence for Human Action Recognition

Video of 10 frames starting with neutral expression and ending with fear expression image

Figure 4.4: Video of 10 frames starting with neutral expression and ending with fear expression image

Figure 4.5: Video of 10 frames starting with neutral expression and ending with happiness expression image major portions of the face and it is measured through analyzing the movement of landmark points over the frames in the sequence. Twenty-three points are considered from four major components of face: eight points are taken from eyes, six points are from eyebrows, and three and six points are selected from nose and lips respectively. Figure 4.9 shows crucial landmark points on the face extracted from a single frame.

Video of 10 frames starting with neutral expression and ending with sadness expression image

Figure 4.6: Video of 10 frames starting with neutral expression and ending with sadness expression image

Video of 10 frames starting with neutral expression and ending with surprise expression image

Figure 4.7: Video of 10 frames starting with neutral expression and ending with surprise expression image

Geometric Feature Extraction

The most powerful component of the Facial Expression Recognition System (FACS) is feature extraction that helps us to represent a sequence to be classified with minimum error. Our proposed approach used a triangulation mechanism to extract geometric features from every frame of the image sequence. Total n = (233) = 1771 number of all possible triangle shapes are

Geometric locations of sixty-eight landmark ponits

Figure 4.8: Geometric locations of sixty-eight landmark ponits

Crucial landmark detection from image frame

Figure 4.9: Crucial landmark detection from image frame

generated for each frame by joining every combination of three landmark points out of twenty-three points associated with major components of the face. For every triangle, we computed three angles to measure angular signature. In this tactic, we generated angular vector of size m x n = lOx 1771 = 17710 that indicates

Angular information generation from landmarks for anger emotion

Figure 4.10: Angular information generation from landmarks for anger emotion

Angular information generation from landmarks for disgust emotion

Figure 4.11: Angular information generation from landmarks for disgust emotion

a single sequence with containing angle ratio. Here m = 10 is a number of frames used in every sequence. Figures 4.10, 4.11, 4.12, 4.13, 4.14 and 4.15 show angles generation from an image frame for all different types of emotion.

Formation of Angular Signature Matrix (ASM) by Triangulation mechanism

Given every three geometric locations (aq, yi), (ж2, У2) and (aq,y:i) of landmark points shown in Figure 4.16 depicted from Figure 4.15 as for showing the computation of all equations 4.1, 4.2, 4.3, 4.4, 4.5 and 4.6, we have calculated lengths of all sides of the triangle: a, b and c by using Euclidean distance formulations given below

82 ■ Computational Intelligence for Human Action Recognition

Angular information generation from landmarks for fear emotion

Figure 4.12: Angular information generation from landmarks for fear emotion

Angular information generation from landmarks for happiness emot ion

Figure 4.13: Angular information generation from landmarks for happiness emot ion

Angular information generation from landmarks for sadness emotion

Figure 4.14: Angular information generation from landmarks for sadness emotion

Angular information generation from landmarks for surprise emotion

Figure 4.15: Angular information generation from landmarks for surprise emotion

Triangle depicted from Figure 4.15

Figure 4.16: Triangle depicted from Figure 4.15

After computing 3 sides of the triangle, all angles of A, В and C are calculated by following formulas

84 ■ Computational Intelligence for Human Action Recognition

As we have used angular information as a geometric feature it stored in feature vector V representing a single frame in a sequence and it is measured by the given equation

Finally, we have formed Angular Signature Matrix (ASM) of size m x n that denotes an image sequence and it is calculated by given below

Here j and i are the indices of sequence number and triangle number respectively.

Emotion Classification

In our system, changes in human emotion concerning time are recognized by taking the image sequences as the input of the Artificial Neural Network (ANN) classifier. Figure 4.17 shows the architecture of the ANN for the sake of ready understanding. Each image in a sequence is represented by a vector of length 1771 constructed from the angular signature. As we considered 10 images for every sequence, we got a single vector of length 17710 after concatenating all 10 vectors representing a sequence fed into the classifier to classify emotions. The network consists of 3 layers: one input layer containing 17710 neurons, one hidden layer having 10 hidden neurons and one output layer containing 6 output neurons. Input of the network is image sequence represented by the vector of length 17710 and the network yields six basic emotions: anger, disgust, fear, happiness, sadness, and surprise as output. Here we have used a scaled conjugate gradient training algorithm for the network that adjusts weight and bias values till finding the minimum error calculated by Mean Square Error (MSE). We have got the classification results after applying the training algorithm exactly 50 times.

 
Source
< Prev   CONTENTS   Source   Next >