Catálogo de publicaciones - libros

Compartir en
redes sociales


Computer Vision: ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28-31, 2002 Proceedings, Part IV

Anders Heyden ; Gunnar Sparr ; Mads Nielsen ; Peter Johansen (eds.)

En conferencia: 7º European Conference on Computer Vision (ECCV) . Copenhagen, Denmark . May 28, 2002 - May 31, 2002

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Image Processing and Computer Vision; Computer Graphics; Pattern Recognition; Artificial Intelligence

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2002 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-43748-2

ISBN electrónico

978-3-540-47979-6

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2002

Tabla de contenidos

Image Registration for Foveated Omnidirectional Sensing

Fadi Dornaika; James Elder

This paper addresses the problem of registering high-resolution, small FOV images with low-resolution panoramic images provided by an omnidirectional catadioptric video sensor. Such systems may find application in surveillance and telepresence systems that require a large FOV and high resolution at selected locations. Although image registration has been studied in more conventional applications, the problem of registering omnidirectional and conventional video has not previously been addressed, and this problem presents unique challenges due to (i) the extreme differences in resolution between the sensors (more than a 16:1 linear resolution ratio in our application), and (ii) the resolution inhomogeneity of omnidirectional images. In this paper we show how a coarse registration can be computed from raw images using parametric template matching techniques. Further, we develop and evaluate robust feature-based and featureless methods for computing the full 2D projective transforms between the two images. We find that our novel featureless approach yields superior performance for this application.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 606-620

Automatic Model Selection by Modelling the Distribution of Residuals

T. F. Cootes; N. Thacker; C. J. Taylor

Many problems in computer vision involve a choice of the most suitable model for a set of data. Typically one wishes to choose a model which best represents the data in a way that generalises to unseen data without overfitting. We propose an algorithm in which the quality of a model match can be determined by calculating how well the distribution of model residuals matches a distribution estimated from the noise on the data. The distribution of residuals has two components - the measurement noise, and the noise caused by the uncertainty in the model parameters. If the model is too complex to be supported by the data, then there will be large uncertainty in the parameters. We demonstrate that the algorithm can be used to select appropriate model complexity in a variety of problems, including polynomial fitting, and selecting the number of modes to match a shape model to noisy data.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 621-635

Assorted Pixels: Multi-sampled Imaging with Structural Models

Shree K. Nayar; Srinivasa G. Narasimhan

Multi-sampled imaging is a general framework for using pixels on an image detector to simultaneously sample multiple dimensions of imaging (space, time, spectrum, brightness, polarization, etc.). The mosaic of red, green and blue spectral filters found in most solid-state color cameras is one example of multi-sampled imaging. We briefly describe how multi-sampling can be used to explore other dimensions of imaging. Once such an image is captured, smooth reconstructions along the individual dimensions can be obtained using standard interpolation algorithms. Typically, this results in a substantial reduction of resolution (and hence image quality). One can extract significantly greater resolution in each dimension by noting that the light fields associated with real scenes have enormous redundancies within them, causing different dimensions to be highly correlated. Hence, multi-sampled images can be better interpolated using local structural models that are learned offline from a diverse set of training images. The specific type of structural models we use are based on polynomial functions of measured image intensities. They are very effective as well as computationally efficient. We demonstrate the benefits of structural interpolation using three specific applications. These are (a) traditional color imaging with a mosaic of color filters, (b) high dynamic range monochrome imaging using a mosaic of exposure filters, and (c) high dynamic range color imaging using a mosaic of overlapping color and exposure filters.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 636-652

Robust Parameterized Component Analysis

Fernando De la Torre; Michael J. Black

Principal Component Analysis (PCA) has been successfully applied to construct linear models of shape, graylevel, and motion. In particular, PCA has been widely used to model the variation in the appearance of people’s faces. We extend previous work on facial modeling for tracking faces in video sequences as they undergo significant changes due to facial expressions. Here we develop person-specific facial appearance models (PSFAM), which use modular PCA to model complex intra-person appearance changes. Such models require aligned visual training data; in previous work, this has involved a time consuming and error-prone hand alignment and cropping process. Instead, we introduce parameterized component analysis to learn a subspace that is invariant to affine (or higher order) geometric transformations. The automatic learning of a PSFAM given a training image sequence is posed as a continuous optimization problem and is solved with a mixture of stochastic and deterministic techniques achieving sub-pixel accuracy. We illustrate the use of the 2D PSFAM model with several applications including video-conferencing, realistic avatar animation and eye tracking.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 653-669

Learning Intrinsic Video Content Using Levenshtein Distance in Graph Partitioning

Jeffrey Ng; Shaogang Gong

We present a novel approach for automatically learning models of temporal trajectories extracted from video data. Instead of using a representation of linearly time-normalised vectors of fixed-length, our approach makes use of Dynamic Time Warp distance as a similarity measure to capture the underlying ordered structure of variable-length temporal data while removing the non-linear warping of the time scale. We reformulate the structure learning problem as an optimal graph-partitioning of the dataset to solely exploit Dynamic Time Warp similarity weights without the need for intermediate cluster centroid representations. We extend the graph partitioning method and in particular, the Normalised Cut model originally introduced for static image segmentation to unsupervised clustering of temporal trajectories with fully automated model order selection. By computing hierarchical average Dynamic Time Warp for each cluster, we learn warp-free trajectory models and recover the time warp profiles and structural variance in the data. We demonstrate the approach on modelling trajectories of continuous hand-gestures and moving objects in an indoor environment.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 670-684

A Tale of Two Classifiers: SNoW vs. SVM in Visual Recognition

Ming-Hsuan Yang; Dan Roth; Narendra Ahuja

Numerous statistical learning methods have been developed for visual recognition tasks. Few attempts, however, have been made to address theoretical issues, and in particular, study the suitability of different learning algorithms for visual recognition. Large margin classifiers, such as SNoW and SVM, have recently demonstrated their success in object detection and recognition. In this paper, we present a theoretical account of these two learning approaches, and their suitability to visual recognition. Using tools from computational learning theory, we show that the main difference between the generalization bounds of SVM and SNoW depends on the properties of the data. We argue that learning problems in the visual domain have sparseness characteristics and exhibit them by analyzing data taken from face detection experiments. Experimental results exhibit good generalization and robustness properties of the SNoW-based method, and conform to the theoretical analysis.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 685-699

Learning to Parse Pictures of People

Remi Ronfard; Cordelia Schmid; Bill Triggs

Detecting people in images is a key problem for video indexing, browsing and retrieval. The main difficulties are the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our goal is to find people in static video frames using learned models of both the appearance of body parts (head, limbs, hands), and of the geometry of their assemblies. We build on Forsyth & Fleck’s general ‘body plan’ methodology and Felzenszwalb & Huttenlocher’s dynamic programming approach for efficiently assembling candidate parts into ‘pictorial structures’. However we replace the rather simple part detectors used in these works with dedicated detectors learned for each body part using Support Vector Machines (SVMs) or Relevance Vector Machines (RVMs). We are not aware of any previous work using SVMs to learn articulated body plans, however they have been used to detect both whole pedestrians and combinations of rigidly positioned subimages (typically, upper body, arms, and legs) in street scenes, under a wide range of illumination, pose and clothing variations. RVMs are SVM-like classifiers that offer a well-founded probabilistic interpretation and improved sparsity for reduced computation. We demonstrate their benefits experimentally in a series of results showing great promise for learning detectors in more general situations.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 700-714

Learning Montages of Transformed Latent Images as Representations of Objects That Change in Appearance

Chris Pal; Brendan J. Frey; Nebojsa Jojic

This paper introduces a novel probabilistic model for representing objects that change in appearance as a result of changes in pose, due to small deformations of their sub-parts and the relative spatial transformation of sub-parts of the object. We call the model a . The model is based upon the idea that an image can be represented as a montage using many, small transformed and cropped patches from a collection of latent images. The approach is similar to that which might be employed by a police artist who might represent an image of a criminal suspect’s face using a montage of face parts cut out of a ”library” of face parts. In contrast, for our model, we learn the library of small latent images from a set of examples of objects that are changing in shape. In our approach, first the image is divided into a grid of sub-images. Each sub-image in the grid acts as window that crops a piece out of one of a collection of slightly larger images possible for that location in the image. We illustrate various probability models that can be used to encode the appropriate relationships for latent images and cropping transformations among the different patches. In this paper we present the complete algorithm for a tree-structured model. We show how the approach and model are able to find representations of the appearance of full body images of people in motion. We show how our approach can be used to learn representations of objects in an ”unsupervised” manner and present results using our model for recognition and tracking purposes in a ”supervised” manner.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 715-731

Exemplar-Based Face Recognition from Video

Volker Krüger; Shaohua Zhou

A new exemplar-based probabilistic approach for face recognition in video sequences is presented. The approach has two stages: First, , which are selected representatives from the raw video, are automatically extracted from gallery videos. The exemplars are used to summarize the gallery video information. In the second part, exemplars are then used as centers for probabilistic mixture distributions for the tracking and recognition process. A particle method is used to compute the posteriori probabilities. Probabilistic methods are attractive in this context as they allow a systematic handling of uncertainty and an elegant way for fusing temporal information.

Contrary to some previous video-based approaches, our approach is not limited to a certain image representation. It rather enhances known ones, such as the PCA, with temporal fusion and uncertainty handling. Experiments demonstrate the effectiveness of each of the two stages. We tested this approach on more than 100 training and testing sequences, with 25 different individuals.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 732-746

Learning the Topology of Object Views

Jan Wieghardt; Rolf P. Würtz; Christoph von der Malsburg

A visual representation of an object must meet at least three basic requirements. First, it must allow identification of the object in the presence of slight but unpredictable changes in its visual appearance. Second, it must account for larger changes in appearance due to variations in the object’s fundamental degrees of freedom, such as, e.g., changes in pose. And last, any object representation must be derivable from visual input alone, i.e., it must be learnable.

We here construct such a representation by deriving transformations between the different views of a given object, so that they can be parameterized in terms of the object’s physical degrees of freedom. Our method allows to automatically derive the appearance representations of an object in conjunction with their linear deformation model from example images. These are subsequently used to provide linear charts to the entire appearance manifold of a three-dimensional object. In contrast to approaches aiming at mere dimensionality reduction the local linear charts to the object’s appearance manifold are estimated on a strictly local basis avoiding any reference to a metric embedding space to all views. A real understanding of the object’s appearance in terms of its physical degrees of freedom is this way learned from single views alone.

- Calibration / Active and Real-Time and Robot Vision / Image and Video Indexing / Medical Image Understanding / Vision Systems / Engineering and Evaluations / Statistical Learning | Pp. 747-760