A New Era in Human Computer Interaction: Facial Expression Recognition

Wednesday, 15 March, 2017

By

Cigdem Turan, Ph.D. Candidate at Hong Kong PolyU

How would you feel if electronic devices could recognize your emotion and take actions based on it? They could cheer you up with a joke when you are sad. They’d recognize sleepiness while you were driving, and help you understand if a person was in real pain or just claiming to be. They could differentiate the Duchenne smile from the forced one or detect depression using facial muscle movements. These applications aren’t promises of the future: they’re possible today with recent developments in signal processing and machine learning algorithms.

For centuries, researchers have studied emotions and facial expressions in the field of psychology. In his 1872 book “The Expression of the Emotions in Man and Animals,” Charles Darwin first highlighted the importance of facial expression in revealing our mental state and emotions. Since then, the main discussions about emotions have focused on nature versus nurture and the universality of certain emotions. The widely accepted theory by Dr. Paul Ekman suggests there are seven basic emotions ( anger, disgust, fear, happiness, sadness, surprise and contempt) that can be recognized universally since they cause the same muscle movement on the face. Paul Ekman et al. also labelled those muscle movements with corresponding Action Units (AUs) in Facial Action Coding System (FACS). For instance, the combination of AU-6 (Cheek Raiser) and AU-12 (Lip Corner Puller) represents happiness while the combination of AU-4 (Brow Lowerer) and AU-15 (Lip Corner Depressor) represents sadness.

Early facial expression recognition (FER) systems detected the seven basic emotions and are based on the above mentioned AUs. Databases created in a controlled environment such as the Extended Cohn-Kanade (CK+) and the Japanese Female Facial Expression (JAFFE) can now achieve recognition rates of up to 97 percent.

Yet it’s still challenging to classify facial expressions in real-life conditions because of pose and lighting variations. FER systems also fail because expressions never solely represent one emotion. Researchers have yet to adopt a continuous emotion framework to break the facial expressions into two dimensions: arousal and valence. With that, expressions are typically classified in a broader sense of emotions. [Figure 1] Cigdem Turan article image

Figure 1 (Right). Two dimensions of emotion space and the distribution of seven basic emotions in arousal-valence (A-V) space [1]

To create a basic FER system in human-computer interaction interface, we need signal processing, both image processing and machine learning algorithms. It would start with a face detection algorithm followed by the facial landmark localization. A pre-processing algorithm can be included to normalize the images by removing any noises or imbalanced lighting. Then, the so-called features would need to be extracted from images to represent them in a way that computers can make use of them. All these steps require accuracy, therefore advanced image processing techniques. Luckily, recent developments in signal processing techniques and higher resolution cameras would allow more accurate face detection from images and facial landmark localization. The last step would be training a computer model that classifies/predicts the emotion label of any given face image and it can benefit from accessing thousands of images on internet in those days.

Excitingly, companies are investing in these types of technologies. MIT Labs created Affectiva, the world’s largest facial expression database, by using deep learning and analyzing almost four million face images. The facial expression analysis software Noldus classifies six basic emotions as well as gaze direction and head orientation. In the second half of 2016, Microsoft released the API for emotion recognition following. In January 2017 Apple bought Emotient, a start-up using AI to recognize facial expressions.

We are in the decade of discoveries beyond our imagination. You can find wearable devices that alleviate the difficulties of social interactions for people with autism. You can read news about a robot interacting with humans and telling them when they are confused. You can find studies trying to develop a system to detect micro-expressions – the subtle expressions that last less than a second. Who knows, one day robots may detect our lies based on our facial expressions or even replace expert therapists. After all these advancements in our digital life, it is just a matter of time.

[1] Gunes, Hatice. "Automatic, dimensional and continuous emotion recognition." (2010).

Cigdem Turan is a member of the IEEE Signal Procesing Society, a Ph.D. candidate at Hong Kong PolyU studying computer vision and is currently a graduate student researcher in the Multimedia Signal Processing Laboratory.

Open Call Deadline

Wed, 03/15/2017 - 12:00

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

Career & Industry

About IEEE SPS

For Volunteers

A New Era in Human Computer Interaction: Facial Expression Recognition

Tags

IEEE Signal Processing Society on

Publications & Resources

Conferences & Events

Education & Training

Community & Involvement

About IEEE SPS

For Volunteers

Career & Industry

Education & Training