ENBIS-18 in Nancy

2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018

Machine Learning and Computer Vision for Face Recognition in the Wild

4 September 2018, 14:50 – 15:10

Abstract

Submitted by
Julien Trombini
Authors
Julien Trombini (Two-i)
Abstract
The focus of our studies is to apply computer vision and particularly face recognition in a non-controlled environment. By saying so, we refer to a place where distance, light, clarity, number of analysed people, angle of the faces, blurriness of the image, … are uncertain and non-stable.

We are constantly analysing the trade off in-between accuracy of the end results, speed and statistical relevance. Choosing the right algorithmic structure to increase the performance of the face recognition and the other treatments that follow is challenging.

Before reaching the final stage of emotion recognition, we must process the picture through several algorithms which will impact speed and accuracy. The question of the number of algorithms / micro-services rises. Is it better to use a long procedural method or to divide the entire process by as many individual tasks as possible?
To extract the facial emotion from a picture, it is necessary to introduce the concept of Face detection, which is challenged by the dilemma of processing thousands of faces per seconds and limiting the number of false positive. Then follows, the quality filters, which are costly in terms of time and/or computation power but can validate or un-validate a picture for the next stage (Example: is the face too small, is the picture blurry, is the lightning good enough…). The face selection is the last stage of the face recognition part. It can be seen as a way to present efficiently the data to the final stage by calculating the angle of the face and to sort them based on the scale from 0 to 1, 0 being a face from profile and 1 being a face from front.

Finally, the Emotion analysis algorithm is highly sensitive to the angle of the face and the size of the picture. One of the dilemmas is to choose between number of faces processed and the accuracy of the emotion. To remain time efficient and accurate, a model using two different kind of algorithms can be used. A SVM for the simple cases and a Resnet 34 for the more complex ones.

I will present two models for face recognition, a LBP cascade and a Resnet, highlighting the main difference in terms of picture processing, accuracy and speed. And finally, I will suggest the idea of using a cascade of CNNs to process efficiently all the described tasks.

Return to programme