Catálogo de publicaciones - libros

Compartir en
redes sociales

Analysis and Modeling of Faces and Gestures: Third International Workshop, AMFG 2007 Rio de Janeiro, Brazil, October 20, 2007 Proceedings

S. Kevin Zhou ; Wenyi Zhao ; Xiaoou Tang ; Shaogang Gong (eds.)

En conferencia: 3º International Workshop on Analysis and Modeling of Faces and Gestures (AMFG) . Rio de Janeiro, Brazil . October 20, 2007 - October 20, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Pattern Recognition; Image Processing and Computer Vision; Artificial Intelligence (incl. Robotics); Computer Graphics; Algorithm Analysis and Problem Complexity

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2007	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-75689-7

ISBN electrónico

978-3-540-75690-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2007

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/978-3-540-75690-3_1

Learning Personal Specific Facial Dynamics for Face Recognition from Videos

Abdenour Hadid; Matti Pietikäinen; Stan Z. Li

In this paper, we present an effective approach for spatiotemporal face recognition from videos using an Extended set of Volume LBP (Local Binary Pattern features) and a boosting scheme. Among the key properties of our approach are: (1) the use of local Extended Volume LBP based spatiotemporal description instead of the holistic representations commonly used in previous works; (2) the selection of only personal specific facial dynamics while discarding the intra-personal temporal information; and (3) the incorporation of the contribution of each local spatiotemporal information. To the best of our knowledge, this is the first work addressing the issue of learning the personal specific facial dynamics for face recognition.

We experimented with three different publicly available video face databases (MoBo, CRIM and Honda/UCSD) and considered five benchmark methods (PCA, LDA, LBP, HMMs and ARMA) for comparison. Our extensive experimental analysis clearly assessed the excellent performance of the proposed approach, significantly outperforming the comparative methods and thus advancing the state-of-the-art.

- Oral - I | Pp. 1-15

doi: 10.1007/978-3-540-75690-3_2

A New Probabilistic Model for Recognizing Signs with Systematic Modulations

Sylvie C. W. Ong; Surendra Ranganath

This paper addresses an aspect of sign language (SL) recognition that has largely been overlooked in previous work and yet is integral to signed communication. It is the most comprehensive work to-date on recognizing complex variations in sign appearances due to grammatical processes (inflections) which systematically modulate the temporal and spatial dimensions of a root sign word to convey information in addition to lexical meaning. We propose a novel dynamic Bayesian network – the Multichannel Hierarchical Hidden Markov Model (MH-HMM)– as a modelling and recognition framework for continuously signed sentences that include modulated signs. This models the hierarchical, sequential and parallel organization in signing while requiring synchronization between parallel data streams at sign boundaries. Experimental results using particle filtering for decoding demonstrate the feasibility of using the MH-HMM for recognizing inflected signs in continuous sentences.

- Oral - I | Pp. 16-30

doi: 10.1007/978-3-540-75690-3_3

Model-Based Stereo with Occlusions

Fabiano Romeiro; Todd Zickler

This paper addresses the recovery of face models from stereo pairs of images in the presence of foreign-body occlusions. In the proposed approach, a 3D morphable model (3DMM) for faces is augmented by an occlusion map defined on the model shape, and occlusion is detected with minimal computational overhead by incorporating robust estimators in the fitting process. Additionally, the method uses an explicit model for texture (or reflectance) in addition to shape, which is in contrast to most existing multi-view methods that use a shape model alone. We argue that both model components are required to handle certain classes of occluders, and we present empirical results to support this claim. In fact, the empirical results in this paper suggest that even in the absence of occlusions, stereo reconstruction using existing shape-only face models can perform poorly by some measures, and that the inclusion of an explicit texture model may be worth its computational expense.

- Oral - I | Pp. 31-45

doi: 10.1007/978-3-540-75690-3_4

View Invariant Head Recognition by Hybrid PCA Based Reconstruction

Qingquan Wu; Jezekiel Ben-Arie

We propose a novel method for 3D head reconstruction and view-invariant recognition from single 2D images. We employ a deterministic Shape From Shading (SFS) method with initial conditions estimated by Hybrid Principal Component Analysis (HPCA) and multi-level global optimization with error-dependent smoothness and integrability constraints. Our HPCA algorithm provides initial estimates of 3D range mapping for the SFS optimization, which is quite accurate and yields much improved 3D head reconstruction. The paper also includes significant contributions in novel approaches to global optimization and in SFS handling of variable and unknown surface albedo, a problem with unsatisfactory solutions by prevalent SFS methods. In the experiments, we reconstruct 3D head range images from 2D single images in different views. The 3D reconstructions are then used to recognize stored model persons. Empirical results show that our HPCA based SFS method provides 3D head reconstructions that notably improve the accuracy compared to other approaches. 3D reconstructions derived from side view (profile) images of 40 persons are tested against 80 3D head models and a recognition rate of over 90% is achieved. Such a capability was not demonstrated by any other method we are aware of.

- Oral - I | Pp. 46-57

doi: 10.1007/978-3-540-75690-3_5

Person-Independent Monocular Tracking of Face and Facial Actions with Multilinear Models

Yusuke Sugano; Yoichi Sato

In tracking face and facial actions of unknown people, it is essential to take into account two components of facial shape variations: shape variation between people and variation caused by different facial actions such as facial expressions. This paper presents a monocular method of tracking faces and facial actions using a multilinear face model that treats interpersonal and intrapersonal shape variations separately. We created this method using a multilinear face model by integrating two different frameworks: particle filter-based tracking for time-dependent facial action and pose estimation and incremental bundle adjustment for person-dependent shape estimation. This unique combination together with multilinear face models is the key to tracking faces and facial actions of arbitrary people in real time with no pre-learned individual face models. Experiments using real video sequences demonstrate the effectiveness of our method.

- Poster - I | Pp. 58-70

doi: 10.1007/978-3-540-75690-3_6

Automatic Facial Expression Recognition Using Boosted Discriminatory Classifiers

Stephen Moore; Richard Bowden

Over the last two decades automatic facial expression recognition has become an active research area. Facial expressions are an important channel of non-verbal communication, and can provide cues to emotions and intentions. This paper introduces a novel method for facial expression recognition, by assembling contour fragments as discriminatory classifiers and boosting them to form a strong accurate classifier. Detection is fast as features are evaluated using an efficient lookup to a chamfer image, which weights the response of the feature. An Ensemble classification technique is presented using a voting scheme based on classifiers responses. The results of this research are a 6-class classifier (6 basic expressions of anger, joy, sadness, surprise, disgust and fear) which demonstrate competitive results achieving rates as high as 96% for some expressions. As classifiers are extremely fast to compute the approach operates at well above frame rate. We also demonstrate how a dedicated classifier can be consrtucted to give optimal automatic parameter selection of the detector, allowing real time operation on unconstrained video.

- Poster - I | Pp. 71-83

doi: 10.1007/978-3-540-75690-3_7

Generating Body Surface Deformation Using Level Set Method

Satoru Morita

Recently skeletal motion data is obtained from the motion capture and is used for movie and sports. The movie production does not need the skeletal motion data but the body surface data. It is difficult to generate body surface data from only skeletal motion data because muscle deforms according to the skeletal motion. Muscle deformation occurs with arm and leg joint rotation. In this paper, we visualize body surface deformation based on the deformation mechanism that is applicable to human motion according to anatomy based modeling. We propose the method generating body surface by covering the skeletal muscles using a thin film based on the level set method. We demonstrate the effectiveness of the system through the generation of the movement of a body builder by using the proposed system.

- Poster - I | Pp. 84-95

doi: 10.1007/978-3-540-75690-3_8

Patch-Based Pose Inference with a Mixture of Density Estimators

David Demirdjian; Raquel Urtasun

This paper presents a patch-based approach for pose estimation from single images using a kernelized density voting scheme. We introduce a boosting-like algorithm that models the density using a mixture of weighted ‘weak’ estimators. The ‘weak’ density estimators and corresponding weights are learned iteratively from a training set, providing an efficient method for feature selection. Given a query image, voting is performed by reference patches similar in appearance to query image patches. Locality in the voting scheme allows us to handle occlusions and reduces the size of the training set required to cover the space of possible poses and appearance. Finally, the pose is estimated as the dominant mode in the density. Multimodality can be handled by looking at multiple dominant modes. Experiments carried out on face and articulated body pose databases show that our patch-based pose estimation algorithm generalizes well to unseen examples, is robust to occlusions and provides accurate pose estimation.

- Poster - I | Pp. 96-108

doi: 10.1007/978-3-540-75690-3_9

Integrating Multiple Visual Cues for Robust Real-Time 3D Face Tracking

Wei-Kai Liao; Douglas Fidaleo; Gérard Medioni

3D face tracking is an important component for many computer vision applications. Most state-of-the-art tracking algorithms can be characterized as being either intensity- or feature-based. The intensity-based tracker relies on the brightness constraint while the feature-based tracker utilizes 2D local feature correspondences. In this paper, we propose a hybrid tracker for robust 3D face tracking. Instead of relying on single source of information, the hybrid tracker integrates feature correspondence and brightness constraints within a nonlinear optimization framework. The proposed method can track the 3D face pose reliably in real-time. We have conducted a series of evaluations to compare the performance of the proposed tracker with other state-of-the-art trackers. The experiments consist of synthetic sequences with simulation of different environmental factors, real sequences with estimated ground truth, and sequences from a real-world HCI application. The proposed tracker is shown to be superior in both accuracy and robustness.

- Poster - I | Pp. 109-123

doi: 10.1007/978-3-540-75690-3_10

Model-Assisted 3D Face Reconstruction from Video

Douglas Fidaleo; Gérard Medioni

This paper describes a model-assisted system for reconstruction of 3D faces from a single consumer quality camera using a structure from motion approach. Typical multi-view stereo approaches use the motion of a sparse set of features to compute camera pose followed by a dense matching step to compute the final object structure. Accurate pose estimation depends upon precise identification and matching of feature points between images, but due to lack of texture on large areas of the face, matching is prone to errors.

To deal with outliers in both the sparse and dense matching stages, previous work either relies on a strong prior model for face geometry or imposes restrictions on the camera motion. Strong prior models result in a serious compromise in final reconstruction quality and typically bear a signature resemblance to a generic or mean face. Model-based techniques, while giving the appearance of face detail, in fact carry this detail over from the model prior. Face features such as beards, moles, and other characteristic geometry are lost. Motion restrictions such as allowing only pure rotation are nearly impossible to satisfy by the end user, especially with a handheld camera.

We significantly improve the robustness and flexibility of existing monocular face reconstruction techniques by introducing a deformable generic face model only at the pose estimation, face segmentation, and preprocessing stages. To preserve data fidelity in the final reconstruction, this generic model is discarded completely and dense matching outliers are removed using tensor voting: a purely data-driven technique. Results are shown from a complete end to end system.

- Poster - I | Pp. 124-138