Catálogo de publicaciones - libros

Compartir en
redes sociales

Computer Vision in Human-Computer Interaction: ICCV 2005 Workshop on HCI, Beijing, China, October 21, 2005, Proceedings

Nicu Sebe ; Michael Lew ; Thomas S. Huang (eds.)

En conferencia: International Workshop on Human-Computer Interaction (HCI) . Beijing, China . October 21, 2005 - October 21, 2005

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

User Interfaces and Human Computer Interaction; Image Processing and Computer Vision; Computer Graphics; Pattern Recognition

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29620-1

ISBN electrónico

978-3-540-32129-3

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11573425_1

Multimodal Human Computer Interaction: A Survey

Alejandro Jaimes; Nicu Sebe

In this paper we review the major approaches to multimodal human computer interaction from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

Palabras clave: Facial Expression; Emotion Recognition; Gesture Recognition; Facial Expression Recognition; Dynamic Bayesian Network.

- Multimodal Human Computer Interaction: A Survey | Pp. 1-15

doi: 10.1007/11573425_2

Tracking Body Parts of Multiple People for Multi-person Multimodal Interface

Sébastien Carbini; Jean-Emmanuel Viallet; Olivier Bernier; Bénédicte Bascle

Although large displays could allow several users to work together and to move freely in a room, their associated interfaces are limited to contact devices that must generally be shared. This paper describes a novel interface called SHIVA (Several-Humans Interface with Vision and Audio) allowing several users to interact remotely with a very large display using both speech and gesture. The head and both hands of two users are tracked in real time by a stereo vision based system. From the body parts position, the direction pointed by each user is computed and selection gestures done with the second hand are recognized. Pointing gesture is fused with n-best results from speech recognition taking into account the application context. The system is tested on a chess game with two users playing on a very large display.

Palabras clave: Speech Recognition; Speech Signal; Gesture Recognition; Stereo Camera; Application Context.

- Tracking | Pp. 16-25

doi: 10.1007/11573425_3

Articulated Body Tracking Using Dynamic Belief Propagation

Tony X. Han; Thomas S. Huang

An efficient articulated body tracking algorithm is proposed in this paper. Due to the high dimensionality of human-body motion, current articulated tracking algorithms based on sampling [1], belief propagation (BP) [2], or non-parametric belief propagation (NBP) [3], are very slow. To accelerate the articulated tracking algorithm, we adapted belief propagation according to the dynamics of articulated human motion. The searching space is selected according to the prediction based on human motion dynamics and current body-configuration estimation. The searching space of the dynamic BP tracker is much smaller than the one of traditional BP tracker [2] and the dynamic BP need not the slow Gibbs sampler used in NBP [3,4,5]. Based on a graphical model similar to the pictorial structure [6] or loose-limbed model [3], the proposed efficient, dynamic BP is carried out to find the MAP of the body configuration. The experiments on tracking the body movement in meeting scenario show robustness and efficiency of the proposed algorithm.

Palabras clave: Body Part; Graphical Model; Belief Propagation; Image Patch; Temporal Constraint.

- Tracking | Pp. 26-35

doi: 10.1007/11573425_4

Recover Human Pose from Monocular Image Under Weak Perspective Projection

Minglei Tong; Yuncai Liu; Thomas S. Huang

In this paper we construct a novel human body model using convolution surface with articulated kinematic skeleton. The human body’s pose and shape in a monocular image can be estimated from convolution curve through nonlinear optimization. The contribution of the paper is in three folds: Firstly, human model based convolution surface with articulated skeletons is presented and its shape is deformable when changing polynomial parameters and radius parameters. Secondly, we give convolution surface and curve correspondence theorem under weak perspective projection, which provide a bridge between the 3D pose and 2D contour. Thirdly, we model the human body’s silhouette with convolution curve in order to estimate joint’s parameters from monocular images. Evalution of the method is performed on a sequence of video frames about a walking man.

Palabras clave: Human Model; Human Body Model; Joint Parameter; Monocular Image; Human Motion Analysis.

- Tracking | Pp. 36-46

doi: 10.1007/11573425_5

A Joint System for Person Tracking and Face Detection

Zhenqiu Zhang; Gerasimos Potamianos; Andrew Senior; Stephen Chu; Thomas S. Huang

Visual detection and tracking of humans in complex scenes is a challenging problem with a wide range of applications, for example surveillance and human-computer interaction. In many such applications, time-synchronous views from multiple calibrated cameras are available, and both frame-view and space-level human location information is desired. In such scenarios, efficiently combining the strengths of face detection and person tracking is a viable approach that can provide both levels of information required and improve robustness. In this paper, we propose a novel vision system that detects and tracks human faces automatically, using input from multiple calibrated cameras. The method uses an Adaboost algorithm variant combined with mean shift tracking applied on single camera views for face detection and tracking, and fuses the results on multiple camera views to check for consistency and obtain the three-dimensional head estimate. We apply the proposed system to a lecture scenario in a smart room, on a corpus collected as part of the CHIL European Union integrated project. We report results on both frame-level face detection and three-dimensional head tracking. For the latter, the proposed algorithm achieves similar results with the IBM “PeopleVision” system.

- Tracking | Pp. 47-59

doi: 10.1007/11573425_6

Perceptive User Interface, a Generic Approach

Michael Van den Bergh; Ward Servaes; Geert Caenen; Stefaan De Roeck; Luc Van Gool

This paper describes the development of a real-time perceptive user interface. Two cameras are used to detect a user’s head, eyes, hand, fingers and gestures. These cues are interpreted to control a user interface on a large screen. The result is a fully functional integrated system that processes roughly 7.5 frames per second on a Pentium IV system. The calibration of this setup is carried out through a few simple and intuitive routines, making the system adaptive and accessible to non-expert users. The minimal hardware requirements are two web-cams and a computer. The paper will describe how the user is observed (head, eye, hand and finger detection, gesture recognition), the 3D geometry involved, and the calibration steps necessary to set up the system.

Palabras clave: Gesture Recognition; White Balance; Connected Component Analysis; Gesture Recognition System; Screen Plane.

- Interfacing | Pp. 60-69

doi: 10.1007/11573425_7

A Vision Based Game Control Method

Peng Lu; Yufeng Chen; Xiangyong Zeng; Yangsheng Wang

The appeal of computer games may be enhanced by vision-based user inputs. The high speed and low cost requirements for near-term, mass-market game applications make system design challenging. In this paper we propose a vision based 3D racing car game controlling method, which analyzes two fists positions of the player in video stream from the camera to get the direction commands of the racing car. This paper especially focuses on the robust and real-time Bayesian network (BN) based multi-cue fusion fist tracking method. Firstly, a new strategy, which employs the latest work in face recognition, is used to create accurate color model of the fist automatically. Secondly, color cue and motion cue are used to generate the possible position of the fist. Then, the posterior probability of each possible position is evaluated by BN, which fuses color cue and appearance cue. Finally, the fist position is approximated by the hypothesis that maximizes a posterior. Based on the proposed control system, a racing car game, “Simulation Drive”, has been developed by our group. Through the game an entirely new experience can be obtained by the player.

Palabras clave: Bayesian Network; Tracking Algorithm; Shift Algorithm; Propose Control System; Game Control.

- Interfacing | Pp. 70-78

doi: 10.1007/11573425_8

Mobile Camera-Based User Interaction

Antonio Haro; Koichi Mori; Tolga Capin; Stephen Wilkinson

We present an approach for facilitating user interaction on mobile devices, focusing on camera-enabled mobile phones. A user interacts with an application by moving their device. An on-board camera is used to capture incoming video and the scrolling direction and magnitude are estimated using a feature-based tracking algorithm. The direction is used as the scroll direction in the application, and the magnitude is used to set the zoom level. The camera is treated as a pointing device and zoom level control in applications. Our approach generates mouse events, so any application that is mouse-driven can make use of this technique.

Palabras clave: Mobile Device; Augmented Reality; Tracking Algorithm; Camera Motion; Template Match.

- Interfacing | Pp. 79-89

doi: 10.1007/11573425_9

Fast Head Tilt Detection for Human-Computer Interaction

Benjamin N. Waber; John J. Magee; Margrit Betke

Accurate head tilt detection has a large potential to aid people with disabilities in the use of human-computer interfaces and provide universal access to communication software. We show how it can be utilized to tab through links on a web page or control a video game with head motions. It may also be useful as a correction method for currently available video-based assistive technology that requires upright facial poses. Few of the existing computer vision methods that detect head rotations in and out of the image plane with reasonable accuracy can operate within the context of a real-time communication interface because the computational expense that they incur is too great. Our method uses a variety of metrics to obtain a robust head tilt estimate without incurring the computational cost of previous methods. Our system runs in real time on a computer with a 2.53 GHz processor, 256 MB of RAM and an inexpensive webcam, using only 55% of the processor cycles.

Palabras clave: Face Detection; Current Frame; Gesture Recognition; Previous Frame; Head Tilt.

- Event Detection | Pp. 90-99

doi: 10.1007/11573425_10

Attention Monitoring Based on Temporal Signal-Behavior Structures

Akira Utsumi; Shinjiro Kawato; Shinji Abe

In this paper, we discuss our system that estimates user attention to displayed content signals with temporal analysis of their exhibited behavior. Detecting user attention and controlling contents are key issues in our “networked interaction therapy system,” which effectively attracts the attention of memory-impaired people. In our proposed system, user behavior, including facial movements and body motions (“beat actions”), is detected with vision-based methods. User attention to the displayed content is then estimated based on the on/off facial orientation from a display system and body motions synchronous to auditorial signals. This attention monitoring mechanism design is derived from observations of actual patients. Estimated attention level can be used for content control to attract more attention of the viewers to the display system. Experimental results suggest that the content switching mechanism effectively attracts user interest.

Palabras clave: Video Content; Gesture Recognition; Content Control; Facial Orientation; Face Tracking.

- Event Detection | Pp. 100-109