Catálogo de publicaciones - libros

Compartir en
redes sociales


Multimodal Technologies for Perception of Humans: First International Evaluation Workshop on Classification of Events, Activities and Relationships, CLEAR 2006, Southampton, UK, April 6-7, 2006, Revised Selected Papers

Rainer Stiefelhagen ; John Garofolo (eds.)

En conferencia: 1º International Evaluation Workshop on Classification of Events, Activities and Relationships (CLEAR) . Southampton, UK . April 6, 2006 - April 7, 2006

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Pattern Recognition; Image Processing and Computer Vision; Artificial Intelligence (incl. Robotics); Computer Graphics; Biometrics; Algorithm Analysis and Problem Complexity

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-69567-7

ISBN electrónico

978-3-540-69568-4

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

The CLEAR 2006 Evaluation

Rainer Stiefelhagen; Keni Bernardin; Rachel Bowers; John Garofolo; Djamel Mostefa; Padmanabhan Soundararajan

This paper is a summary of the first CLEAR evaluation on CLassification of Events, Activities and Relationships - which took place in early 2006 and concluded with a two day evaluation workshop in April 2006. CLEAR is an international effort to evaluate systems for the multimodal perception of people, their activities and interactions. It provides a new international evaluation framework for such technologies. It aims to support the definition of common evaluation tasks and metrics, to coordinate and leverage the production of necessary multimodal corpora and to provide a possibility for comparing different algorithms and approaches on common benchmarks, which will result in faster progress in the research community. This paper describes the evaluation tasks, including metrics and databases used, that were conducted in CLEAR 2006, and provides an overview of the results. The evaluation tasks in CLEAR 2006 included person tracking, face detection and tracking, person identification, head pose estimation, vehicle tracking as well as acoustic scene analysis. Overall, more than 20 subtasks were conducted, which included acoustic, visual and audio-visual analysis for many of the main tasks, as well as different data domains and evaluation conditions.

- Overview | Pp. 1-44

3D Audiovisual Person Tracking Using Kalman Filtering and Information Theory

Nikos Katsarakis; George Souretis; Fotios Talantzis; Aristodemos Pnevmatikakis; Lazaros Polymenakos

This paper proposes a system for tracking people in three dimensions, utilizing audiovisual information from multiple acoustic and video sensors. The proposed system comprises a video and an audio subsystem combined using a Kalman filter. The video subsystem combines in 3D a number of 2D trackers based on a variation of Stauffer’s adaptive background algorithm with spacio-temporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. The audio subsystem uses an information theoretic metric upon a pair of microphones to estimate the direction from which sound is arriving from. Combining measurements from a series of pairs the actual coordinate of the speaker in space is derived.

- 3D Person Tracking | Pp. 45-54

A Generative Approach to Audio-Visual Person Tracking

Roberto Brunelli; Alessio Brutti; Paul Chippendale; Oswald Lanz; Maurizio Omologo; Piergiorgio Svaizer; Francesco Tobia

This paper focuses on the integration of acoustic and visual information for people tracking. The system presented relies on a probabilistic framework within which information from multiple sources is integrated at an intermediate stage. An advantage of the method proposed is that of using a generative approach which supports easy and robust integration of multi source information by means of sampled projection instead of triangulation. The system described has been developed in the EU funded CHIL Project research activities. Experimental results from the CLEAR evaluation workshop are reported.

- 3D Person Tracking | Pp. 55-68

An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR’06 Evaluation Dataset

Kai Nickel; Tobias Gehrig; Hazim K. Ekenel; John McDonough; Rainer Stiefelhagen

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR’06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.

- 3D Person Tracking | Pp. 69-80

Multi-and Single View Multiperson Tracking for Smart Room Environments

Keni Bernardin; Tobias Gehrig; Rainer Stiefelhagen

Simultaneous tracking of multiple persons in real world environments is an active research field and several approaches have been proposed, based on a variety of features and algorithms. In this work, we present 2 multimodal systems for tracking multiple users in a smart room environment. One is a multi-view tracker based on color histogram tracking and special person region detectors. The other is a wide angle overhead view person tracker relying on foreground segmentation and model-based tracking. Both systems are completed by a joint probabilistic data association filter-based source localization framework using input from several microphone arrays.

We also very briefly present two intuitive metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time.

The trackers are extensively tested and compared, for each modality separately, and for the combined modalities, on the CLEAR 2006 Evaluation Database.

- 3D Person Tracking | Pp. 81-92

UPC Audio, Video and Multimodal Person Tracking Systems in the Clear Evaluation Campaign

Alberto Abad; Cristian Canton-Ferrer; Carlos Segura; José Luis Landabaso; Dušan Macho; Josep Ramon Casas; Javier Hernando; Montse Pardàs; Climent Nadeu

Reliable measures of person positions are needed for computational perception of human activities taking place in a smart-room environment. In this work, we present the Person Tracking systems developed at UPC for audio, video and audio-video modalities in the context of the EU funded CHIL project research activities. The aim of the designed systems, and particularly of the new contributions proposed, is to deal robustly in both single and multiperson localization tasks independently on the environmental conditions. Besides the technology description, experimental results conducted for the CLEAR evaluation workshop are also reported.

- 3D Person Tracking | Pp. 93-104

A Joint System for Single-Person 2D-Face and 3D-Head Tracking in CHIL Seminars

Gerasimos Potamianos; Zhenqiu Zhang

We present the IBM systems submitted and evaluated within the CLEAR’06 evaluation campaign for the tasks of single person visual 3D tracking (localization) and 2D face tracking on CHIL seminar data. The two systems are significantly inter-connected to justify their presentation within a single paper as a joint vision system for single person 2D-face and 3D-head tracking, suitable for smart room environments with multiple synchronized, calibrated, stationary cameras. Indeed, in the developed system, face detection plays a pivotal role in 3D person tracking, being employed both in system initialization as well as in detecting possible tracking drift. Similarly, 3D person tracking determines the 2D frame regions where a face detector is subsequently applied. The joint system consists of a number of components that employ detection and tracking algorithms, some of which operate on input from all four corner cameras of the CHIL smart rooms, while others select and utilize two out of the four available cameras. Main system highlights constitute the use of AdaBoost-like multi-pose face detectors, a spatio-temporal dynamic programming algorithm to initialize 3D location hypotheses, and an adaptive subspace learning based tracking scheme with a forgetting mechanism as a means to reduce tracking drift. The system is benchmarked on the CLEAR’06 CHIL seminar database, consisting of 26 lecture segments recorded inside the smart rooms of the UKA and ITC CHIL partners. Its resulting 3D single-person tracking performance is 86% accuracy with a precision of 88 mm, whereas the achieved face tracking score is 54% correct with 37% wrong detections and 19% misses. In terms of speed, an inefficient system implementation runs at about 2 fps on a P4 2.8 GHz desktop.

- 3D Person Tracking | Pp. 105-118

Speaker Tracking in Seminars by Human Body Detection

Bo Wu; Vivek Kumar Singh; Ram Nevatia; Chi-Wei Chu

This paper presents evaluation results of a method for tracking speakers in seminars from multiple cameras. First, 2D human tracking and detection is done for each view. Then, 2D locations are converted to 3D based on the calibration parameters. Finally, cues from multiple cameras are integrated in a incremental way to refine the trajectories. We have developed two multi-view integration methods, which are evaluated and compared on the CHIL speaker tracking test set.

- 3D Person Tracking | Pp. 119-126

TUT Acoustic Source Tracking System 2006

Pasi Pertilä; Teemu Korhonen; Tuomo Pirinen; Mikko Parviainen

This paper documents the acoustic source tracking system developed by TUT for the 2006 CLEAR evaluation campaign. The described system performs 3-D single person tracking based on audio data received from multiple spatially separated microphone arrays. The evaluation focuses on meeting room domain.

The system consists of four distinct stages. First stage is time delay estimation (TDE) between microphone pairs inside each array. Based on the TDE, direction of arrival (DOA) vectors are calculated for each array using a confidence metric. Source localization is done by using a selected combination of DOA estimates. The location estimate is tracked using a particle filter to reduce noise. The system is capable of locating a speaker 72 % of the time with an average accuracy of 25 cm.

- 3D Person Tracking | Pp. 127-136

Tracking Multiple Speakers with Probabilistic Data Association Filters

Tobias Gehrig; John McDonough

In prior work, we developed a speaker tracking system based on an extended Kalman filter using time delays of arrival (TDOAs) as acoustic features. In particular, the TDOAs comprised the observation associated with an iterated extended Kalman filter (IEKF) whose state corresponds to the speaker position. In other work, we followed the same approach to develop a system that could use both audio and video information to track a moving lecturer. While these systems functioned well, their utility was limited to scenarios in which a single speaker was to be tracked. In this work, we seek to remove this restriction by generalizing the IEKF, first to a probabilistic data association filter, which incorporates a clutter model for rejection of spurious acoustic events, and then to a joint probabilistic data association filter (JPDAF), which maintains a separate state vector for each active speaker. In a set of experiments conducted on seminar and meeting data, we demonstrate that the JPDAF provides tracking performance superior to the IEKF.

- 3D Person Tracking | Pp. 137-150