Catálogo de publicaciones - libros

Compartir en
redes sociales


Deterministic and Statistical Methods in Machine Learning: First International Workshop, Sheffield, UK, September 7-10, 2004. Revised Lectures

Joab Winkler ; Mahesan Niranjan ; Neil Lawrence (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Database Management; Information Storage and Retrieval; Image Processing and Computer Vision; Pattern Recognition

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2005 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29073-5

ISBN electrónico

978-3-540-31728-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2005

Tabla de contenidos

Object Recognition via Local Patch Labelling

Christopher M. Bishop; Ilkay Ulusoy

In recent years the problem of object recognition has received considerable attention from both the machine learning and computer vision communities. The key challenge of this problem is to be able to recognize any member of a category of objects in spite of wide variations in visual appearance due to variations in the form and colour of the object, occlusions, geometrical transformations (such as scaling and rotation), changes in illumination, and potentially non-rigid deformations of the object itself. In this paper we focus on the detection of objects within images by combining information from a large number of small regions, or ‘patches’, of the image. Since detailed hand-segmentation and labelling of images is very labour intensive, we make use of ‘weakly labelled’ data in which the training images are labelled only according to the presence or absence of each category of object. A major challenge presented by this problem is that the foreground object is accompanied by widely varying background clutter, and the system must learn to distinguish the foreground from the background without the aid of labelled data. In this paper we first show that patches which are highly relevant for the object discrimination problem can be selected automatically from a large dictionary of candidate patches during learning, and that this leads to improved classification compared to direct use of the full dictionary. We then explore alternative techniques which are able to provide labels for the individual patches, as well as for the image as a whole, so that each patch is identified as belonging to one of the object categories or to the background class. This provides a rough indication of the location of the object or objects within the image. Again these individual patch labels must be learned on the basis only of overall image class labels. We develop two such approaches, one discriminative and one generative, and compare their performance both in terms of patch labelling and image labelling. Our results show that good classification performance can be obtained on challenging data sets using only weak training labels, and they also highlight some of the relative merits of discriminative and generative approaches.

Pp. 1-21

Multi Channel Sequence Processing

Samy Bengio; Hervé Bourlard

This paper summarizes some of the current research challenges arising from multi-channel sequence processing. Indeed, multiple real life applications involve simultaneous recording and analysis of multiple information sources, which may be asynchronous, have different frame rates, exhibit different stationarity properties, and carry complementary (or correlated) information. Some of these problems can already be tackled by one of the many statistical approaches towards sequence modeling. However, several challenging research issues are still open, such as taking into account asynchrony and correlation between several feature streams, or handling the underlying growing complexity. In this framework, we discuss here two novel approaches, which recently started to be investigated with success in the context of large multimodal problems. These include the asynchronous HMM, providing a principled approach towards the processing of multiple feature streams, and the layered HMM approach, providing a good formalism for decomposing large and complex (multi-stream) problems into layered architectures. As briefly reported here, combination of these two approaches yielded successful results on several multi-channel tasks, ranging from audio-visual speech recognition to automatic meeting analysis.

Pp. 22-36

Bayesian Kernel Learning Methods for Parametric Accelerated Life Survival Analysis

Gavin C. Cawley; Nicola L. C. Talbot; Gareth J. Janacek; Michael W. Peck

Survival analysis is a branch of statistics concerned with the time elapsing before “failure”, with diverse applications in medical statistics and the analysis of the reliability of electrical or mechanical components. In this paper we introduce a parametric accelerated life survival analysis model based on kernel learning methods that, at least in principal, is able to learn arbitrary dependencies between a vector of explanatory variables and the scale of the distribution of survival times. The proposed kernel survival analysis method is then used to model the growth domain of , that is the food processing and storage conditions permitting the growth of this foodborne microbial pathogen, leading to the production of the neurotoxin responsible for botulism. A Bayesian training procedure, based on the evidence framework, is used for model selection and to provide a credible interval on model predictions. The kernel survival analysis models are found to be more accurate than models based on more traditional survival analysis techniques, but also suggest a risk assessment of the foodborne botulism hazard would benefit from the collection of additional data.

Pp. 37-55

Extensions of the Informative Vector Machine

Neil D. Lawrence; John C. Platt; Michael I. Jordan

The informative vector machine (IVM) is a practical method for Gaussian process regression and classification. The IVM produces a sparse approximation to a Gaussian process by combining assumed density filtering with a heuristic for choosing points based on minimizing posterior entropy. This paper extends IVM in several ways. First, we propose a novel noise model that allows the IVM to be applied to a mixture of labeled and unlabeled data. Second, we use IVM on a block-diagonal covariance matrix, for “learning to learn” from related tasks. Third, we modify the IVM to incorporate prior knowledge from known invariances. All of these extensions are tested on artificial and real data.

Pp. 56-87

Efficient Communication by Breathing

Tom H. Shorrock; David J. C. MacKay; Chris J. Ball

The arithmetic-coding-based communication system, Dasher, can be driven by a one-dimensional continuous signal. A belt-mounted breath-mouse, delivering a signal related to lung volume, enables a user to communicate by breath alone. With practice, an expert user can write English at 15 words per minute.

Pp. 88-97

Guiding Local Regression Using Visualisation

Dharmesh M. Maniyar; Ian T. Nabney

Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains ( biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.

Pp. 98-109

Transformations of Gaussian Process Priors

Roderick Murray-Smith; Barak A. Pearlmutter

Gaussian process prior systems generally consist of noisy measurements of samples of the putatively Gaussian process of interest, where the samples serve to constrain the posterior estimate. Here we consider the case where the measurements are instead of samples. This framework incorporates measurements of derivative information and of filtered versions of the process, thereby allowing GPs to perform sensor fusion and tomography; allows certain group invariances (ie symmetries) to be weakly enforced; and under certain conditions suitable application allows the dataset to be dramatically reduced in size. The method is applied to a sparsely sampled image, where each sample is taken using a broad and non-monotonic point spread function. It is also applied to nonlinear dynamic system identification applications where a nonlinear function is followed by a known linear dynamic system, and where observed data can be a mixture of irregularly sampled higher derivatives of the signal of interest.

Pp. 110-123

Kernel Based Learning Methods: Regularization Networks and RBF Networks

Petra Kudová; Roman Neruda

We discuss two kernel based learning methods, namely the Regularization Networks (RN) and the Radial Basis Function (RBF) Networks. The RNs are derived from the regularization theory, they had been studied thoroughly from a function approximation point of view, and they posses a sound theoretical background. The RBF networks represent a model of artificial neural networks with both neuro-physiological and mathematical motivation. In addition they may be treated as a generalized form of Regularization Networks. We demonstrate the performance of both approaches on experiments, including both benchmark and real-life learning tasks. We claim that RN and RBF networks are comparable in terms of generalization error, but they differ with respect to their model complexity. The RN approach usually leads to solutions with higher number of base units, thus, the RBF networks can be used as a ’cheaper’ alternative. This allows to utilize the RBF networks in modeling tasks with large amounts of data, such as time series prediction or semantic web classification.

Pp. 124-136

Redundant Bit Vectors for Quickly Searching High-Dimensional Regions

Jonathan Goldstein; John C. Plat; Christopher J. C. Burges

Applications such as audio fingerprinting require search in high dimensions: find an item in a database that is similar to a query. An important property of this search task is that negative answers are very frequent: much of the time, a query does not correspond to any database item.

We propose (RBVs): a novel method for quickly solving this search problem. RBVs rely on three key ideas: 1) approximate the high-dimensional regions/distributions as tightened hyperrectangles, 2) partition the query space to store each item redundantly in an index and 3) use bit vectors to store and search the index efficiently.

We show that our method is the preferred method for very large databases or when the queries are often not in the database. Our method is 109 times faster than linear scan, and 48 times faster than locality-sensitive hashing on a data set of 239369 audio fingerprints.

Pp. 137-158

Bayesian Independent Component Analysis with Prior Constraints: An Application in Biosignal Analysis

Stephen Roberts; Rizwan Choudrey

In many data-driven machine learning problems it is useful to consider the data as generated from a set of unknown (latent) generators or sources. The observations we make are then taken to be related to these sources through some unknown functionaility. Furthermore, the (unknown) number of underlying latent sources may be different to the number of observations and hence issues of model complexity plague the analysis. Recent developments in Independent Component Analysis (ICA) have shown that, in the case where the unknown function linking sources to observations is linear, data decomposition may be achieved in a mathematically elegant manner. In this paper we extend the general ICA paradigm to include a very flexible source model and prior constraints and argue that for particular biomedical signal processing problems (we consider EEG analysis) we require the constraint of in the mixing process.

Pp. 159-179