Catálogo de publicaciones - libros

Compartir en
redes sociales

Deterministic and Statistical Methods in Machine Learning: First International Workshop, Sheffield, UK, September 7-10, 2004. Revised Lectures

Joab Winkler ; Mahesan Niranjan ; Neil Lawrence (eds.)

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Mathematical Logic and Formal Languages; Database Management; Information Storage and Retrieval; Image Processing and Computer Vision; Pattern Recognition

Disponibilidad

Institución detectada	Año de publicación	Navegá	Descargá	Solicitá
No detectada	2005	SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-29073-5

ISBN electrónico

978-3-540-31728-9

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

2005

Información sobre derechos de publicación

Cobertura temática

Ciencias de la computación e información

Ingeniería eléctrica, electrónica e informática

Lenguas y literatura

Tabla de contenidos

Verificá que desde tu institución tengas acceso para descargar o solicitar el libro completo o alguno de sus capítulos.

doi: 10.1007/11559887_11

Ensemble Algorithms for Feature Selection

Jeremy D. Rogers; Steve R. Gunn

Many feature selection algorithms are limited in that they attempt to identify relevant feature subsets by examining the features individually. This paper introduces a technique for determining feature relevance using the average information gain achieved during the construction of decision tree ensembles. The technique introduces a node complexity measure and a statistical method for updating the feature sampling distribution based upon confidence intervals to control the rate of convergence. A feature selection threshold is also derived, using the expected performance of an irrelevant feature. Experiments demonstrate the potential of these methods and illustrate the need for both feature weighting and selection.

Pp. 180-198

doi: 10.1007/11559887_12

Can Gaussian Process Regression Be Made Robust Against Model Mismatch?

Peter Sollich

Learning curves for Gaussian process (GP) regression can be strongly affected by a mismatch between the ‘student’ model and the ‘teacher’ (true data generation process), exhibiting e.g. multiple overfitting maxima and logarithmically slow learning. I investigate whether GPs can be made robust against such effects by adapting student model hyperparameters to maximize the evidence (data likelihood). An approximation for the average evidence is derived and used to predict the optimal hyperparameter values and the resulting generalization error. For large input space dimension, where the approximation becomes exact, Bayes-optimal performance is obtained at the evidence maximum, but the actual hyperparameters (e.g. the noise level) do not necessarily reflect the properties of the teacher. Also, the theoretically achievable evidence maximum cannot always be reached with the chosen set of hyperparameters, and maximizing the evidence in such cases can actually make generalization performance worse rather than better. In lower-dimensional learning scenarios, the theory predicts—in excellent qualitative and good quantitative accord with simulations—that evidence maximization eliminates logarithmically slow learning and recovers the optimal scaling of the decrease of generalization error with training set size.

Pp. 199-210

doi: 10.1007/11559887_13

Understanding Gaussian Process Regression Using the Equivalent Kernel

Peter Sollich; Christopher K. I. Williams

The equivalent kernel [1] is a way of understanding how Gaussian process regression works for large sample sizes based on a continuum limit. In this paper we show how to approximate the equivalent kernel of the widely-used squared exponential (or Gaussian) kernel and related kernels. This is easiest for uniform input densities, but we also discuss the generalization to the non-uniform case. We show further that the equivalent kernel can be used to understand the learning curves for Gaussian processes, and investigate how kernel smoothing using the equivalent kernel compares to full Gaussian process regression.

Pp. 211-228

doi: 10.1007/11559887_14

Integrating Binding Site Predictions Using Non-linear Classification Methods

Yi Sun; Mark Robinson; Rod Adams; Paul Kaye; Alistair Rust; Neil Davey

Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. There is good reason to believe that predictions from these different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks, rules sets and support vector machines on predictions from 12 key algorithms. Furthermore, we use a ‘window’ of consecutive results in the input vector in order to contextualise the neighbouring results. Moreover, we improve the classification result with the aid of under- and over- sampling techniques. We find that support vector machines outperform each of the original individual algorithms and other classifiers employed in this work with both type of inputs, in that they maintain a better tradeoff between recall and precision.

Pp. 229-241

doi: 10.1007/11559887_15

Support Vector Machine to Synthesise Kernels

Hongying Meng; John Shawe-Taylor; Sandor Szedmak; Jason D. R. Farquhar

In this paper, we introduce a new method (SVM_2K) which amalgamates the capabilities of the Support Vector Machine (SVM) and Kernel Canonical Correlation Analysis (KCCA) to give a more sophisticated combination rule that the boosting framework allows. We show how this combination can be achieved within a unified optimisation model to create a consistent learning rule which combines the classification abilities of the individual SVMs with the synthesis abilities of KCCA. To solve the unified problem, we present an algorithm based on the Augmented Lagrangian Method. Experiments show that SVM_2K performs well on generic object recognition problems in computer vision.

Pp. 242-255

doi: 10.1007/11559887_16

Appropriate Kernel Functions for Support Vector Machine Learning with Sequences of Symbolic Data

Bram Vanschoenwinkel; Bernard Manderick

In classification problems, machine learning algorithms often make use of the assumption that (dis)similar inputs lead to (dis)similar outputs. In this case, two questions naturally arise: what does it mean for two inputs to be similar and how can this be used in a learning algorithm? In support vector machines, between input examples is implicitly expressed by a kernel function that calculates inner products in the feature space. For numerical input examples the concept of an inner product is easy to define, for discrete structures like sequences of symbolic data however these concepts are less obvious. This article describes an approach to SVM learning for symbolic data that can serve as an alternative to the bag-of-words approach under certain circumstances. This latter approach first transforms symbolic data to vectors of numerical data which are then used as arguments for one of the standard kernel functions. In contrast, we will propose kernels that operate on the symbolic data directly.

Pp. 256-280

doi: 10.1007/11559887_17

Variational Bayes Estimation of Mixing Coefficients

Bo Wang; D. M. Titterington

We investigate theoretically some properties of variational Bayes approximations based on estimating the mixing coefficients of known densities. We show that, with probability 1 as the sample size grows large, the iterative algorithm for the variational Bayes approximation converges locally to the maximum likelihood estimator at the rate of (1/). Moreover, the variational posterior distribution for the parameters is shown to be asymptotically normal with the same mean but a different covariance matrix compared with those for the maximum likelihood estimator. Furthermore we prove that the covariance matrix from the variational Bayes approximation is ‘too small’ compared with that for the MLE, so that resulting interval estimates for the parameters will be unrealistically narrow.

Pp. 281-295

doi: 10.1007/11559887_18

A Comparison of Condition Numbers for the Full Rank Least Squares Problem

Joab R. Winkler

Condition numbers of the full rank least squares (LS) problem min|| − || are considered theoretically and their computational implementation is compared. These condition numbers range from a simple normwise measure that may overestimate by several orders of magnitude the true numerical condition of the LS problem, to refined componentwise and normwise measures. Inequalities that relate these condition numbers are established, and it is concluded that the solution of the LS problem may be well-conditioned in the normwise sense, even if one of its components is ill-conditioned. It is shown that the refined condition numbers are ill-conditioned in some circumstances, the cause of this ill-conditioning is identified, and its implications are discussed.

Pp. 296-318

doi: 10.1007/11559887_19

SVM Based Learning System for Information Extraction

Yaoyong Li; Kalina Bontcheva; Hamish Cunningham

This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-the-art systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine.

Pp. 319-339