Detection and Tracking of Facial Landmarks Using Gabor Wavelet Networks

The detection and tracking of faces and facial features is an important research problem in computer vision. In general, methods that process facial feature points, or facial landmarks, are either computationally expensive or not robust to feature deformations. We propose a fast and robust approach for detecting and tracking facial landmarks (eyes, nose and mouth) that is based on Gabor wavelet networks (GWN) [1], an effective technique for object representation.

The GWN approach was proposed recently by Volker Krueger and Gerald Sommer in the pattern recognition community.They have done experiments with face tracking, face recognition and pose estimation. We verified that this technique may be also successfully applied for detecting and tracking facial features in video sequences.

Face representation using GWN: Basically, the considered discrete face template is represented as a linear combination of 2D Gabor wavelets functions, whose parameters (position, scale and orientation) are stored in the network nodes, while the linear coefficients are represented as the synaptical weights. The weights and the wavelet parameters are determined optimally (Levenberg-Marquard gradient descent method [2]) so that the maximum of image information is preserved for a given number of wavelets. The figure below illustrates: (a) a discrete face template, (b) its wavelet representation (using 52 Gabor functions) and (c) the position of the 16 largest wavelets.

Face Matching using the GWN: The face representation described above may be affinely deformated in order to match a new face image, so that its wavelets are registered on the same facial features as in the original image. This process is called GWN repositioning and it allows face tracking. For instance, consider the face template shown in the figure (a) above and let G be its optimized GWN. Now, consider this face image in a different pose as shown in the figure (a) below. In the repositioning process, the set of wavelets of G are positioned correctly on the same facial features in the distorted image. Below, figure (b) illustrates the repositioned wavelet representation with 52 Gabor wavelet functions. Figure (c) shows the position of the 16 largest wavelets in the image. The repositioning procedure is established by using a superwavelet [1], which considers the entire face representation and has its affine parameters optimized by the Levenberg-Marquard method.

Facial Landmarks Location: We have trained a GWN on a mean face obtained from the Yale Face Database. The repositioning worked well on all individual images of the this face dataset, which considers different facial expressions and illumination changes. In order to detect facial landmarks in a test image, we first locate facial feature points (pupils, center of nose and center of mouth) in the mean face. Then, the considered GWN is repositioned in the target face image. Facial landmarks are automatically detected by applying a suitable affine transformation to the initial facial feature points of the mean face. The correct parameters of this transformation are obtained from the superwavelet parameter vector [1]. Figure below illustrates the detection procedure in some images of the Yale database. Note that it works even in the presence of beard and glasses.

Tracking Facial Landmarks: Our tracking method assumes that the facial landmarks have been correctly determined in the first frame of an image sequence, as described above. We then apply, in each frame, the suitable affine transformation to the detected feature points, performing facial landmark tracking. It is important to emphasize that this procedure considers the overall geometry of the face, thus being robust to deformations such as eye blinking and smile, which is usually a critical situation to the most local-based approaches. In this sense, our method does not require a high inter-frame correlation around the feature areas as it is required in template matching, for instance.

Experiments: Basically, our approach can be divided in three subsequent steps: face detection, facial landmarks location and tracking of face and facial landmarks. The first step is performed by a skin-color approach as well as by a simple correlation procedure to verify the presence of a face in the detected skin-blob [3]. Once the face was detected, its scale information is obtained and the color face region is converted into a grey-level image. Facial landmarks are then located by repositioning a GWN into the face region. The position and scale of the face-like blob are used as initial parameters in the repositioning procedure. Finally, face and facial landmarks are tracked along the video sequence, being robust to homogeneous illumination changes and affine deformations of the face image. We have tested our method in different color video sequences, obtaining good results. You can download here a facial feature tracking demonstration.


[1] V. Kruger and G. Sommer, "Affine real-time face tracking using a wavelet network". Presented at the ICCV'99 Workshop Recognition, Analysis, and Tracking of Faces and Gestures in Real-time Systems, Corfu, Greece, September 1999.
[2] W. Press, B. Flannery, S. Teukolsky and W. Vetterling, Numerical Recipes, The Art of Scientific Computing, Cambridge University Press, UK, 1986.
[3] R. Feris, T. Campos and R. Cesar, "Detection and tracking of facial features in video sequences", Lecture Notes in Artificial Intelligence, vol. 1793, pp. 127-135, Springer-Verlag, April 2000.