Catálogo de publicaciones - libros

Compartir en
redes sociales


Machine Learning: ECML 2007: 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007. Proceedings

Joost N. Kok ; Jacek Koronacki ; Raomon Lopez de Mantaras ; Stan Matwin ; Dunja Mladenič ; Andrzej Skowron (eds.)

En conferencia: 18º European Conference on Machine Learning (ECML) . Warsaw, Poland . September 17, 2007 - September 21, 2007

Resumen/Descripción – provisto por la editorial

No disponible.

Palabras clave – provistas por la editorial

Artificial Intelligence (incl. Robotics); Algorithm Analysis and Problem Complexity; Mathematical Logic and Formal Languages; Database Management

Disponibilidad
Institución detectada Año de publicación Navegá Descargá Solicitá
No detectada 2007 SpringerLink

Información

Tipo de recurso:

libros

ISBN impreso

978-3-540-74957-8

ISBN electrónico

978-3-540-74958-5

Editor responsable

Springer Nature

País de edición

Reino Unido

Fecha de publicación

Información sobre derechos de publicación

© Springer-Verlag Berlin Heidelberg 2007

Tabla de contenidos

Learning to Classify Documents with Only a Small Positive Training Set

Xiao-Li Li; Bing Liu; See-Kiong Ng

Many real-world classification applications fall into the class of positive and unlabeled (PU) learning problems. In many such applications, not only could the negative training examples be missing, the number of positive examples available for learning may also be fairly limited due to the impracticality of hand-labeling a large number of training examples. Current PU learning techniques have focused mostly on identifying reliable negative instances from the unlabeled set . In this paper, we address the oft-overlooked PU learning problem when the number of training examples in the positive set is small. We propose a novel technique LPLP (Learning from Probabilistically Labeled Positive examples) and apply the approach to classify product pages from commercial websites. The experimental results demonstrate that our approach outperforms existing methods significantly, even in the challenging cases where the positive examples in and the hidden positive examples in were not drawn from the same distribution.

- Long Papers | Pp. 201-213

Structure Learning of Probabilistic Relational Models from Incomplete Relational Data

Xiao-Lin Li; Zhi-Hua Zhou

Existing relational learning approaches usually work on complete relational data, but real-world data are often incomplete. This paper proposes the MGDA approach to learn structures of probabilistic relational model (PRM) from incomplete relational data. The missing values are filled in randomly at first, and a maximum likelihood tree (MLT) is generated from the complete data sample. Then, Gibbs sampling is combined with MLT to modify the data and regulate MLT iteratively for obtaining a well-completed data set. Finally, probabilistic structure is learned through dependency analysis from the completed data set. Experiments show that the MGDA approach can learn good structures from incomplete relational data.

- Long Papers | Pp. 214-225

Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA

Dimitrios Mavroeidis; Michalis Vazirgiannis

The stability of sample based algorithms is a concept commonly used for parameter tuning and validity assessment. In this paper we focus on two well studied algorithms, LSI and PCA, and propose a feature selection process that provably guarantees the stability of their outputs. The feature selection process is performed such that the level of (statistical) accuracy of the LSI/PCA input matrices is adequate for computing meaningful (stable) eigenvectors. The feature selection process “sparsifies” LSI/PCA, resulting in the projection of the instances on the eigenvectors of a principal submatrix of the original input matrix, thus producing sparse factor loadings that are linear combinations solely of the selected features. We utilize bootstrapping confidence intervals for assessing the statistical accuracy of the input sample matrices, and matrix perturbation theory in order to relate the statistical accuracy to the stability of eigenvectors. Experiments on several UCI-datasets verify empirically our approach.

- Long Papers | Pp. 226-237

Bayesian Substructure Learning - Approximate Learning of Very Large Network Structures

Andreas Nägele; Mathäus Dejori; Martin Stetter

In recent years, Bayesian networks became a popular framework to estimate the dependency structure of a set of variables. However, due to the NP-hardness of structure learning, this is a challenging task and typical state-of-the art algorithms fail to learn in domains with several thousands of variables. In this paper we introduce a novel algorithm, called substructure learning, that reduces the complexity of learning large networks by splitting this task into several small subtasks. Instead of learning one complete network, we estimate the network structure iteratively by learning small subnetworks. Results from several benchmark cases show that substructure learning efficiently reconstructs the network structure in large domains with high accuracy.

- Long Papers | Pp. 238-249

Efficient Continuous-Time Reinforcement Learning with Adaptive State Graphs

Gerhard Neumann; Michael Pfeiffer; Wolfgang Maass

We present a new reinforcement learning approach for deterministic continuous control problems in environments with unknown, arbitrary reward functions. The difficulty of finding solution trajectories for such problems can be reduced by incorporating limited prior knowledge of the approximative local system dynamics. The presented algorithm builds an adaptive state graph of sample points within the continuous state space. The nodes of the graph are generated by an efficient principled exploration scheme that directs the agent towards promising regions, while maintaining good online performance. Global solution trajectories are formed as combinations of local controllers that connect nodes of the graph, thereby naturally allowing continuous actions and continuous time steps. We demonstrate our approach on various movement planning tasks in continuous domains.

- Long Papers | Pp. 250-261

Source Separation with Gaussian Process Models

Sunho Park; Seungjin Choi

In this paper we address a method of source separation in the case where sources have certain temporal structures. The key contribution in this paper is to incorporate Gaussian process (GP) model into source separation, representing the latent function which characterizes the temporal structure of a source by a random process with Gaussian prior. Marginalizing out the latent function leads to the Gaussian marginal likelihood of source that is plugged in the mutual information-based loss function for source separation. In addition, we also consider the leave-one-out predictive distribution of source, instead of the marginal likelihood, in the same framework. Gradient-based optimization is applied to estimate the demixing matrix through the mutual information minimization, where the marginal distribution of source is replaced by the marginal likelihood of the source or its leave-one-out predictive distribution. Numerical experiments confirm the useful behavior of our method, compared to existing source separation methods.

- Long Papers | Pp. 262-273

Discriminative Sequence Labeling by Z-Score Optimization

Elisa Ricci; Tijl de Bie; Nello Cristianini

We consider a new discriminative learning approach to sequence labeling based on the statistical concept of the -score. Given a training set of pairs of hidden-observed sequences, the task is to determine some parameter values such that the hidden labels can be correctly reconstructed from observations. Maximizing the -score appears to be a very good criterion to solve this problem both theoretically and empirically. We show that the -score is a convex function of the parameters and it can be efficiently computed with dynamic programming methods. In addition to that, the maximization step turns out to be solvable by a simple linear system of equations. Experiments on artificial and real data demonstrate that our approach is very competitive both in terms of speed and accuracy with respect to previous algorithms.

- Long Papers | Pp. 274-285

Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

Mark Schmidt; Glenn Fung; Rómer Rosales

L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state-of-the-art optimization techniques to solve this problem across several loss functions. Furthermore, we propose two new techniques. The first is based on a smooth (differentiable) convex approximation for the L1 regularizer that does not depend on any assumptions about the loss function used. The other technique is a new strategy that addresses the non-differentiability of the L1-regularizer by casting the problem as a constrained optimization problem that is then solved using a specialized gradient projection method. Extensive comparisons show that our newly proposed approaches consistently rank among the best in terms of convergence speed and efficiency by measuring the number of function evaluations required.

- Long Papers | Pp. 286-297

Bayesian Inference for Sparse Generalized Linear Models

Matthias Seeger; Sebastian Gerwinn; Matthias Bethge

We present a framework for efficient, accurate approximate Bayesian inference in generalized linear models (GLMs), based on the expectation propagation (EP) technique. The parameters can be endowed with a factorizing prior distribution, encoding properties such as sparsity or non-negativity. The central role of posterior log-concavity in Bayesian GLMs is emphasized and related to stability issues in EP. In particular, we use our technique to infer the parameters of a point process model for neuronal spiking data from multiple electrodes, demonstrating significantly superior predictive performance when a sparsity assumption is enforced via a Laplace prior distribution.

- Long Papers | Pp. 298-309

Classifier Loss Under Metric Uncertainty

David B. Skalak; Alexandru Niculescu-Mizil; Rich Caruana

Classifiers that are deployed in the field can be used and evaluated in ways that were not anticipated when the model was trained. The final evaluation metric may not have been known at training time, additional performance criteria may have been added, the evaluation metric may have changed over time, or the real-world evaluation procedure may have been impossible to simulate. Unforeseen ways of measuring model utility can degrade performance. Our objective is to provide experimental support for modelers who face potential “cross-metric” performance deterioration. First, to identify model-selection metrics that lead to stronger cross-metric performance, we characterize the expected loss where the selection metric is held fixed and the evaluation metric is varied. Second, we show that the number of data points evaluated by a selection metric has substantial impact on the optimal evaluation. While addressing these issues, we consider the effect of calibrating the classifiers to output probabilities influences. Our experiments show that if models are well calibrated, cross-entropy is the highest-performing selection metric if little data is available for model selection. With these experiments, modelers may be in a better position to choose selection metrics that are robust where it is uncertain what evaluation metric will be applied.

- Long Papers | Pp. 310-322